a distributed algorithm for the graph center...

A distributed algorithm for thegraph center problem

Linlin SongMathematics Department, University of York

Supervisor: Keith Briggs

Complexity research group, BT Exact, Martleshammailto:[email protected] , mailto:[email protected]

August 29, 2003

mailto:[email protected]

mailto:[email protected]

Contents

1 Introduction 1

1.1 What is a distributed system? . . . . . . . . . . . . . . . . . . 1

1.2 Complex networks . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Distributed sensor network . . . . . . . . . . . . . . . . . . . 3

1.4 Geometric centrality . . . . . . . . . . . . . . . . . . . . . . . 3

2 The model of computation 6

2.1 Distributed algorithms . . . . . . . . . . . . . . . . . . . . . . 6

2.2 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.2 Full asynchronism and full synchronism: . . . . . . . 8

2.2.3 The model . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4 Language support . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Basic algorithms and related works 23

3.1 Test connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 All-pair shortest path . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Leader election . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3.1 Assumption made in this section . . . . . . . . . . . . 30

3.3.2 Extinction and a fast algorithm . . . . . . . . . . . . . 31

3.4 Summary of papers . . . . . . . . . . . . . . . . . . . . . . . . 33

2 CONTENTS

4 A new algorithm for the graph center problem 37

4.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 Layered structure . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3 Message structure . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.4 How does it work? . . . . . . . . . . . . . . . . . . . . . . . . 41

4.5 Modification of algorithms . . . . . . . . . . . . . . . . . . . . 42

4.6 Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.7 Generating large graphs . . . . . . . . . . . . . . . . . . . . . 45

4.8 How to use the code? . . . . . . . . . . . . . . . . . . . . . . . 45

4.9 Sample results . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.10 Estimation and Future works . . . . . . . . . . . . . . . . . . 50

A Program listings 53

A.1 Test connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . 53

A.2 Shortest path . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

A.3 Leader election . . . . . . . . . . . . . . . . . . . . . . . . . . 58

A.4 The full code for my center algorithm . . . . . . . . . . . . . 60

B Proofs of some theorems mentioned in this report 68

List of Figures

1.1 Application space of a distributed sensor network . . . . . . 4

1.2 Presumable real applications . . . . . . . . . . . . . . . . . . 5

2.1 A space-time diagram . . . . . . . . . . . . . . . . . . . . . . 13

2.2 A layered network architecture . . . . . . . . . . . . . . . . . 21

4.1 Node hardware architecture and naming method . . . . . . 38

4.2 Process flow graph . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3 Message passing between layers between nodes . . . . . . . 40

4.4 Use tuple message structure . . . . . . . . . . . . . . . . . . . 40

4.5 Message passing between layers inside the node . . . . . . . 42

4.6 Use list message structure . . . . . . . . . . . . . . . . . . . . 45

4.7 Example network with a center . . . . . . . . . . . . . . . . . 46

4.8 Process of the program: top left: Result of connectivity; topright: Return the connected partition; bottom left: Computethe eccentricity; bottom right: Compute the center . . . . . . 48

4.9 Sample data set . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Abstract

In this dissertation I am going to introduce a new asynchronous distributedalgorithm for the graph center problem, and a simulation program basedon this algorithm. This simulates how to find a center node in a distributedmultihop network, so that we solve a location problem. The algorithm isbased on three subalgorithms (test connectivity, all-pair shortest path, andleader election); each operates in a different layer. The simulation programautomatically generates a random connected network for testing. All thenodes run the same code. The cost of communication through a link isthe number of hops it contains. This program can be slightly changed tofind the median of a graph, which is solution of another location problem.This program will also provide all-pair shortest path information by pro-viding the first neighbor identification on the corresponding shortest path.The full program code and introduction of usage is provided. This reportcontains reference material on the theoretical part for ease of reading.

Chapter 1

Introduction

1.1 What is a distributed system?

We use the term distributed system to mean an interconnected collection ofautonomous computers, processes, or processors. They are all referred toas the nodes of the distributed system. To be qualified as ‘autonomous’,the nodes must at least be equipped with their own private control. To bequalified as ‘interconnected’, the nodes must be able to exchange informa-tion.

As software or process can play the role of nodes of a system, the def-inition includes software systems built as a collection of communicatingprocesses, even when running on a single hardware installation. In mostcases, however, a distributed system will at least contain several proces-sors, interconnected by communication hardware.

More restrictive definitions of distributed systems are also found in theliterature. Tanenbaum [Tan88], for example, considers a system to be dis-tributed only if the existence of autonomous nodes is transparent to users ofthe system. A system distributed in this sense behaves like a virtual, stand-alone computer system, but the implementation of this transparency re-quires the development of intricate distributed control algorithm.

2 Introduction

1.2 Complex networks

Complex networks occur in diverse areas from metabolic and gene regula-tion networks in each cell, food web in ecology, transportation networks,economic interactions and the organization of the internet. A network isconveniently modelled as a graph G which consists of a set of vertices anda set of edges which we regard as pairs of distinct vertices. various ofnetwork and graph models are discussed next.

1. Transport network: To simulate the network of interaction betweenobjects in the simulation, where these objects are not only travellers,but also traffic signals, traffic management centers, etc. For fast largescale simulations, one employs distributed computers, and mappingthese interactions on the computational system is critical for highcomputing performance. Travellers and other and other objects in atransportation system interact. For example, congestion is formed bytravellers being in each other’s way; ride sharing necessitates trav-ellers to meet at a common pick-up location; adaptive traffic lightsreact to traffic conditions; etc.There are also increasingly networkedservices which are both sparse and non-local for example electronicroute guidance systems where only very few travellers communicatewith a center.

2. Economics network: Almost any serious consideration of economicsorganization leads to the conclusion that network structures bothwithin and between organizations are important. The idea that net-works of relations at various levels have an important effect on eco-nomic activity is familiar in sociology. we consider the interactionstructure as being modelled by a graph where the agents are thenodes and two nodes are linked by an edge if the correspondingagents interact. We assume that each agent is influenced only by alimited (finite) number of other agents who are within a certain dis-tance of him. Such individuals are usually referred to as the agent’s‘neighbors’. As agents’ interactions are limited to a set of neighbors,changes will not affect all agents simultaneously but rather diffuseacross the economy. The way in which such diffusion will occur, andthe speed with which it happens, will depend crucially on the nature

3 Introduction

of the neighborhood structure. It is the connectivity of the graph ofrelations that will be essential here.

1.3 Distributed sensor network

A distributed sensor network is a network which is composed of sensorsdistributed in the environment. Sensors include: cameras as vision sen-sors, microphones as audio sensors, ultrasonic sensors, infra-red sensors,temperature sensors, humidity sensors, force sensors, pressure sensors,vibration sensors.

Of course, they are not limited to them. One particular example inwireless mobile communication is Mobile ad hoc networks a mobile ad hocnetwork is a network wherein a pair of nodes communicates by sendingmessages either over a direct wireless link, or over a sequence of wire-less links including one or more intermediate nodes. Only pairs of nodesthat lie within one another’s transmission radius can directly communi-cate with each other. Wireless link ‘failures’ occur when previously com-municating nodes move such that they are no longer within transmissionrange of each other. Likewise, wireless link ‘formation’ occurs when nodesthat were too far separated to communicate move such that they are withintransmission range of each other. Distributed sensors can cover a largespace even if the sensing range of each sensor is limited. That is, by in-tegrating many sensors information, we can acquire diverse and preciseinformation of the environment where those sensors cover. Figures 1.1and 1.2 below illustrate an application space figure 1.1and presumablereal applications figure 1.2 of the distributed sensor network, respectively.

1.4 Geometric centrality

Geometric notions of centrality are closely linked to facility location prob-lems. Suppose, we give a graph G representing, say, a traffic network. Wemay then ask questions such as the following:

(a) What is the optimal location of a hospital such that the worst caseresponse time of an ambulance is minimal?

4 Introduction

Figure 1.1: Application space of a distributed sensor network

(b) What is the optimal location of a shopping mall so that the averagedriving time to the mall is minimal?

These two classical facility local problems can be recast as optimizationproblems based on the distance matrix D = (d(x, y)) of G. Their solutiondefine two different notions of ‘central’ vertices.

The eccentricity of a vertex x in G and the radius ρ(G), respectively, aredefined as e(x) = max d(x, y), y ∈ V and ρ(G) = min e(x), x ∈ V. Thecenter of G is the set C(G) = { x ∈ V |e(x) = ρ(G) }. C(G) is the solutionof the ‘emergency facility local problem’ (A) which is always contained ina single block of G.

The status d(x) of a vertex and the status σ(G) of the graph G, respec-tively, are defined as d(x) =

∑y∈V d(x, y) and σ(G) = min d(x).

The median is the solution of the ‘service facility location problem’ (B).The median of G is the set

M(G) = { x ∈ V |d(x) = σ(G) }.

5 Introduction

Figure 1.2: Presumable real applications

There are a few center algorithms in the literature, but they all havetheir own limitations, for example limit to particular network topology,complete graph, tree, ring, or is designed to be synchronous. In this report,I present a new asynchronous center algorithms for arbitrary network, thisalgorithm is highly adaptive, it could be used to solve the median problemand also can slightly changed to solve other more specific optimizationproblems.

Chapter 2

The model of computation

2.1 Distributed algorithms

The previous chapter have given reasons for use of distributed computersystems and explained the nature of these systems; the need to programthese systems arises as a consequence. The programming of distributedsystems must be based on the use of correct, flexible, and efficient algo-rithms.Distributed and centralized systems differ in a number of essentialrespects:

(1) Lack of knowledge of global state. In a centralized algorithm controldecisions can be made based upon an observation of the state of thesystem. Even though the entire state usually cannot be accessed oneby one.

(2) Lack of global time-frame. The event constitute the execution of acentralized algorithm are totally ordered in a natural way by theirtemporal occurrence; for each pair of events, one occurs earlier con-stituting the execution of a distributed algorithm is not total; forsome pairs of events there may be a reason for deciding that oneoccurs before the other, but for other pairs it is the case that neitherof the events occurs before the other.

(3) Non-determinism. A centralized program may describe the compu-tation as it unrolls from a certain input unambiguously; given the

7 The model of computation

program and the input, only a single computation is possible. Incontrast, due to possible differences in execution speed of the sys-tem components.

2.2 The model

2.2.1 Introduction

Distributed applications are pervading many aspects of everyday life. Book-ing reservations, banking, electronic and point-of-sale commerce are no-ticeable examples of such applications . Those applications are built ontop of distributed systems. We present a distributed algorithm by the con-nected directed graph GT = (NT , DT ), where the node set NT is a set oftasks and the set of directed edges DT is a set of unidirectional communi-cation channels.

The communication in a distributed system relies on the message pass-ing. A node is reactive (or message-driven) entity, in the sense that nor-mally it only performs computation (including the sending of messages toother nodes) as a response to the receipt of a message from another node.

These are some points of the asynchronous distributed algorithm model.More details will be added later in this chapter.

1. Network of identical nodes, each with message queue. This meansthat each node of the network has a unique name. The messagequeue may contains more than one buffer depends on how manylayered a node contains. The reason to be a queue is to ensure theFIFO order of message processing.

2. Each knows only its neighbours. This assumes that there exists alower layer to provide nodes with neighbours list. Generally this listcontains the identification or address of the neighbours.

3. ∃ a finite set of pre-specified messages. This means, since the com-munication in a distributed system relies on the message passing orinteraction of nodes. Different message corresponds with differentstate of the system.


4. Each performs the same subalgorithm. This means each node hasthe ability of computation. An distributed algorithm is designed todeal with all the possible states the system may have, and this applyto all the node in the same system.

5. Links is reliable and links between each pair of nodes deliver inter-processor messages in the FIFO order. This is not necessary in alldistributed algorithm, but this is needed for those we are going todiscuss in this report.

6. Indefinite delay before reply to message. This is made well suited tothe asynchronous situation.

7. Each runs asynchronously with respect to neighbours. This meansthat it is impossible to define an upper bound on process schedulingdelays and on message transfer delays. This is due to the fact thatneither the input load from users nor the precise load of the underly-ing network can be accurately predicted. This means that whateveris the value used by a process to set a timer, this value cannot betrusted by the process when it has to take a system-wide consistentdecision.

2.2.2 Full asynchronism and full synchronism:

when we investigate the nature of the computation carried out by G’snodes with respect to their timing characteristics. This investigation willenable us to complete the model of computations given by G with the ad-dition of its timing properties.

The first model we introduce is the fully asynchronous (or simply asyn-chronous model), which is characterized by the following two properties.

1. Full asynchronism

(1) Each node is driven by its own, local, independent time basis,referred to as its local clock.

(2) The delay that a message suffers to be delivered between neigh-bors is finite but unpredictable.


2. Full synchronism

(1) All nodes are driven by global time basis, referred to as theglobal clock, which generates time intervals (or simply inter-vals) of fixed, nonzero duration.

(2) The delay that a message suffers to be delivered between neigh-bors is nonzero and strictly less than the duration of an intervalof the global clock.

The complete asynchronism assumed in this case makes it very real-istic from the standpoint of somehow reflecting some of the charac-teristics of the distributed systems.

2.2.3 The model

There are various of distributed computing models in the literature, forexample, [Bar96a], [BT96], [Lyn97], [RS99], I like the one of Gerard Tel[Ger91]. Next I am going to describe Tel’s model.

A distributed computation is usually considered as a collection of dis-crete events, each event being an atomic change in the configuration (thestate of the entire system). This notion is captured in the definition of tran-sition systems, leading to the notion of reachable configurations and a con-structive definition of the set of executions induced by an algorithm. Whatmakes a system ‘distributed’ is that each transition is only influenced by,or only influences, part of the configuration, basically the (local) state of asingle process. (Or, the local states of the subset of interacting processes.)

1. Transition Systems and Algorithms. A system whose state changesin discrete steps (transitions or events) can usually be convenientlydescribed by the notion of a transition system. In the study of dis-tributed algorithms this applies to the distributed system as a whole,as well as to the individual processes that cooperate in the algorithm.Therefore, transition systems are an important concept in the studyof distributed algorithms.

Processes in a distributed system communicate either by accessingshared variables or by message passing. We shall take a more re-


strictive point of view and consider only distributed systems wherecommunication is by means of exchanging messages.

Messages in distributed systems can be passed either synchronouslyor asynchronously. Our main emphasis is on algorithms for sys-tems where messages are passed asynchronously. For many pur-poses synchronous message passing can be regarded as special cansof asynchronous passing, as was demonstrated by Charron-Bost etal. [CBMT92].

2. Transition Systems. A transition system consists of the set of all pos-sible states of the system. The transitions (‘moves’) the system canmake in this set, and a subset of states in which the system is allowedto start. To avoid confusion between the states of a single process andthe states of the entire algorithm (the ‘global states’). The latter willfrom now on be called configuration.

Definition 2.1 A transition system is a triple S = (C → I), where C isa set of configuration, → is a binary transition relation on C, and I is asubset of C of initial configurations.

A transition relation is a subset of C × C. Instead of (γ, δ) ∈→ themore convenient notation γ → δ is used.

Definition 2.2 Let S = (C, I) be a transition system. An execution ofS is a maximal sequence E = (γ0, γ1, γ2, ...), where γ0 ∈ I, and for alli ≥ 0, γi → γi+1.

A terminal configuration is a configuration γ for which there is nosuch that δ. Note that a sequence E = (γ0, γ1, γ2, ...) with i ≥ 0, γi →γi+1 for all i, is maximal if it is either infinite or ends in a terminalconfiguration.

Definition 2.3 Configuration δ is reachable from γ, notation ,γ → δ,if there exists a sequence γ = γ0, γ1, γ2, ....γkδ with γi γi+1 for all0 ≤ i < k. Configuration δ is reachable if it is reachable from an initialconfiguration.


3. Systems with asynchronous message passing. A distributed systemconsists of a collection of processes and communication subsystem.Each process is a transition system in itself, with the annotation thatit can interact with the communication subsystem. To avoid con-fusion between attributes of the distributed system as a whole andattributes of individual processes, we use the following convention.The terms ‘transition’ and ‘configuration’ are used for attributes ofthe entire system, and the (other equivalent) terms ‘event’ and ‘state’are used for attributes of processes. To interact with the communi-cation system a process has not only ordinary events (referred to asinternal events) but also send events and receive events, in which amessage is produced or consumed. Let M be a set of possible mes-sages and denotes the collection of multisets with elements from Mby M(M).

Definition 2.4 The local algorithm of a process is a quintuple (Z, I,ì,`s

,`r), where Z is a set of states, I is a subset of Z of initial states, ì is arelation on Z ×Z, and `s and `r are relations on Z ×M×Z. The binaryrelation ` on Z is defined by

c ` d ⇔ (c, d) ∈ì ∨∃m ∈M((c, m, d) ∈`s ∪ `t)

The relations ì, `s correspond to state transitions related with in-ternal, send, and receive events, respectively. In the sequel we shalldenote processes by p, q, r, p1, p2,etc., and denote the set of processesof a system by P. The executions of a process are the executions ofthe transition system (Z,`, I), in such an execution the executionsof the processes are coordinated through the communication sub-system. To describe the coordination, we shall define a distributedsystem as a transition system where the configuration set, transitionrelation, and initial states are constructed from the correspondingcomponents of the processes.

Definition 2.5 A distributed algorithm for a collection P = p1, ..., pN ofprocesses is a collection of local algorithms, one for each process in P.

The behavior of a distributed algorithm is described by a transitionsystem as follows. A configuration consists of the state of each pro-cess and the collection of messages in transit; the transition are the


events of the processes, which do not only affect the state of the pro-cess, but can also affect (and be affected by) the collection of mes-sages; the initial configurations are the configurations where eachprocess is in an initial state and the message collection is empty.

Definition 2.6 The transition system induced under asynchronous com-munication by a distributed algorithms for processes p1, ...pk ( where thelocal algorithm for process pi is (Zpi

, Ipi,ì

pi,`s

pi,`r

pi)), is S = (C,→, I)

where

(1) C = (cp1 , ..., cpN , M) : ∀p ∈ ZpandM ∈ M(M)

(2) →= (∪p∈P →p), where →p are the transitions corresponding to thestate changes of process p;→p is the set of pairs (cp1 , ..., cpi

, ..., cpN, M1),

(cp1 , ..., c′pi, ..., cpN

, M2) for which one of the following three conditionsholds:

(a) cp1 , cpi∈ì

piand M1 = M2;

(b) for some m ∈M, (cpi, m, c′pi

) ∈`spi

and M2 = M1 ∪m;(c) for some m ∈M, (cpi

, m, c′pi) ∈`r

piand M2 = M1 ∪m;

(3) I = (cp1 , ..., (cpN, M)) : (∀p ∈ P : cp ∈ Ip) ∧M = ∅

An execution of the distributed algorithm is an execution of this in-duced transition system. The events of an execution are made ex-plicit with the following notations. The pairs(c, d) ∈ì

p are called(possible) internal events of process p, and the triples in `s

p and `rp

are called the send and receive events of the process.

(1) An internal event e given by e = (c, d) of p is called applicablein configuration γ = (cp1 , ..., cp, ..., cpN

, M) if cp = c. In this case,e(γ) is defined as the configuration cp1 , ..., cp, ..., cpN

, M).

(2) A send event e given by e = (c, m, d) of p is called applicable inconfiguration γ = (cp1 , ..., cp, ..., cpN

, M) if cp = c. In this case,e(γ) is defined as the configuration

It is assumed that for each message there is a unique process thatcan receive the message. This process is called the destination of themessage.


c

1pa b

f g h

j k

d

l

i

e

a l i e c b kd h jg f

1p

1p

1p

F

γ Configuration

Figure 2.1: A space-time diagram

4. Causal order of events and logical clocks. The view on executions assequences of transitions naturally induces a notion of time in execu-tions. A transition a is then said occur earlier than transition b if a

occurs in the sequence before b. For an execution E = (γ0, γ1, ...), de-fine the associated sequence of events E = (e0, e1, ...), where ei is theevent by which the configuration changes from γi to γi+1. Observethat each execution defines a unique sequence of events in this way.An execution can be visualized in a space-time diagram, of whichfigure 2.1 presents and example. In such a diagram, a horizontal lineis drawn for each process, and each event is drawn as a dot on theline of the process where it takes place, If a message m is sent in events and received in event r, an arrow is drawn from s to r; the events s

and r are said to be corresponding events in this case.

Events of a distributed execution can sometime be interchanged with-out affecting the later configurations of the execution. Therefore anotion of time as a total order on the events of an execution is notsuitable for distributed executions, and instead the notion of causaldependence is introduced next.


5. Independence and Dependence of Events. It has been remarked al-ready that the transitions of a distributed system influence, and areinfluenced by, only part of the configuration. This leads to the ob-servation that two consecutive events, influencing disjoint parts ofsystems with asynchronous message passing. This is expressed inthe following theorem.

Theorem 2.1 Let γ be a configuration of a distributed system (with asyn-chronous message passing) and let ep be events of different processes p andq, both applicable in γ. Then ep(γ), and ep(eq(γ)) = eq(ep(γ)).

Proof see sectionB.0.1.

Indeed, the theorem explicitly states that p and q must be different,and if eq receives the message sent in ep, the receive event is not ap-plicable in the starting configuration of ep , as also required. Thus,if one of these two statements is true, the events cannot occur in thereversed order; otherwise they can occur in reversed order and yetresult in the same configuration. Note that from a global point ofview transitions cannot be exchanged, because (in the notation ofTheorem 2.1) the transition from γp to γpq is different from the tran-sition from γ to γq. However, from the point of view of the processthese events are indistinguishable.

The fact that a particular pair of events cannot be exchanged is ex-pressed by saying that there is a causal relation between these twoevents. This relation can be extended to partial order on the set ofevents of an execution, called the causal order of the execution.

Definition 2.7 Definition Let E be an execution. The relation ≺, calledcausal order, on the events of the execution is the smallest relation thatsatisfies.

(1) If e and f are different events of the same process and e occurs beforef , then e ≺ f .

(2) If s is a send event and r the corresponding receive event, then s ≺ γ.

(3) ≺ is transitive.


We write a � b to denote (a ≺ b ∨ a = b). As � is a partial order.There may exist events a and b for which neither a � b nor b � a

holds. Such events are said to be concurrent, notation a ‖ b. In figure2.1 b ‖ f , d ‖ i, etc.

Causal order was first defined by Lamport [Lam78] and plays an im-portant role in the reasoning concerning distributed algorithms. Thedefinition of ≺ implies the existence of a causality chain betweencausally related events. By this we mean that a ≺ b implies the exis-tence of a sequence a = e0, e1, ..., ek = b, such that each pair satisfying2.7 is a consecutive pair of events in the process where they occur, i.e.there is no other event between them. In figure 2.1, a causality chainbetween event a and event l is the sequence a, f , g, h, k, l.

6. Equivalence of executions computations. In this subsection it is shownthat the events of an execution can be reordered in an y order con-sistent with the causal order, without affecting the result of the ex-ecution. This reordering of the events gives rise to a different se-quence of configurations, but this execution will be execution will beregarded as equivalent to the original execution.

Let f = (f0, f1, f2, ...) be a sequence of events. This sequence is thesequence of events related to an execution. This reordering of theevents gives rise to different sequence of configurations. but thisexecution will be

7. Additional assumptions, complexity. The definitions made so far inthis chapter are sufficient to set the scene for the remaining chap-ters; the defined model serves as a framework for the presentationand verification of algorithms, as well as for impossibility proof forsolutions of distributed problems. This section discuss some of thisterminology, which is also common in the literature on distributedalgorithms.

8. Properties of the channels. The model can be refined by represent-ing the contents of each channel separately in the configuration, thatis, replacing the set M by a collection of sets Mpq for each (unidi-rectional) channel pq. As we have postulated that each messageimplicitly defines its destination. This modification does not alterthe important properties of the model. Next some commonly made


assumptions about the correspondence between send and receiveevents are discussed.

(a) Reliability. A channel is said to be reliable when every messagethat is sent in the channel is received exactly once (provided thatthe destination is able to receive the message). Unless statedotherwise, it is always assumed in this paper that the channelare reliable. This assumption in fact adds a (weak) fairness con-dition; indeed, after a message has been sent, the receipt of thismessage (in the suitable state of the destination) is applicablefrom them on.

A channel that is not reliable may exhibit communication fail-ures, which can be of several types, e.g. loss, garble, duplica-tion, creation. These failures can be represented by transitionsin the model of definition 2.6, but these transitions do not cor-respond to state changes of a process.

The loss of a message occurs when the message is sent, butnever received; it can be modelled by a transition that removesthe message from M . The garbling of a message occurs whenthe message received is different from the message sent; it canbe modelled by a transition that changes one message of M . Theduplication of a message occurs when the message is receivedmore often than it is sent; it is modelled by a transition thatcopies a message of M . The creation of a message occurs whena message is received that has never been sent; it is modelled bya transition that inserts a message in M .

(b) The fifo property. A channel is said to be fifo if it respects theorder of the messages sent through it . That is, if p sends twomessages m1 and m2 to process q and the sending of m1 occursearlier in p than the sending of m2, then the receipt of m1 occursearlier in q than the receipt of m2.

Fifo channels can be represented in the model of Definition 2.6by replacing the collection M by a set of queues, one for eachqueue, and a receive event deletes a message from the head.When fifo channels are assumed, a new type of communicationfailure arises, namely the reordering of messages in a channel;


it can be modelled by a transition that exchanges two messagesin the queue.

A weaker assumption was proposed by Ahuja [Ahu90]; A flushchannel is a channel that respects the order only of messagesfor which this is specified by the sender. Stronger assumptionscan also be defined. Schiper et al. [SES89] defined causally or-dered message delivery as follows. If p1 and p2 send messagesm1 and m2 to process q in events e1 and e2 and it is the case thate1 ≺ e2, then q receives m1 before m2. A hierarchy of deliveryassumptions, consisting of asynchronism, causally ordered de-livery, fifo, and synchronous communication, was discussed byCharron-Bost et al. [CBMT92]

(c) Channel capacity. The capacity is the number of messages thatcan be in transit in the channel at the same time. The channel isfull in each configuration in which it actually contains a numberof messages equal to its capacity. A sending event is applicableonly if the channel is not full.

Definition 2.6 models channels with unbounded capacity, i.e.,channels that are never full. In this paper it will always be as-sumed that the capacity of the channels is unbounded.

9. Real-time assumptions. An essential property of the model presentedis, of course, its distributiveness: the complete independence of eventsin different processes, as expressed in Theorem 2.1. This property islost when a global time frame and the ability of processes to be ob-serve physical time (a physical clock device) are assumed. Indeed,when some real time elapses, this time elapses in all processes, andthis will show up on the clock of each process.

Real-time clocks can be incorporated by equipping each process witha real-time clock variable; the elapse of real time is modelled bya transition that puts forward the clock of each process; usually, abound on the message transmission time(the time between sendingand receiving the message) is assumed in conjunction with the avail-ability of real-time clocks. This bound can also be included in thegeneral model of transition systems.

10. Process knowledge. Initial process knowledge is the term used to


refer to information about the distributed system that is representedin the in initial states of the processes. If an algorithm is said to relyon such information it is assumed that the relevant information iscorrectly stored in the processes prior to the start of the executionof the system. Examples of such knowledge including the followinginformation.

(1) Topological information. Information about the topology includes:the number of processes, the diameter of the network graph,and the topology of the graph. The network is said to have asense of direction if a consistent edge-labelling with directionsin the graph is known to the process.

(2) Process identity. In many algorithms it is required that the pro-cesses have unique names (identities), and that each processknows its own name initially. The processes are the supposedto contain a variable that is initialized to this name, Further as-sumption can be made regarding the set from which the namesare chosen, such as that the names are linearly ordered or thatthey are integers.

(3) Neighbor identities. If processes are distinguished by a uniquename. It is possible to assume that each process knows initiallythe names of its neighbors. This assumption is referred to asneighbor knowledge useful for purposes of message address-ing; the name of the destination of a message is given whensending a message by direct addressing . A stronger assump-tion is that each process knows the entire collection of processnames.

11. The complexity of distributed algorithms. The most important prop-erty of a distributed algorithm is its correctness: it must satisfy therequirements posed by the problem that the algorithm is to solve.To compare different algorithms for the same problem it is useful tomeasure the consumption of resources by an algorithm. The lowerthis assumption, the ’better’ is the algorithm. The resource consump-tion of distributed algorithms can be measured in several ways.

(1) Message complexity. This is the total number of messages ex-changed by the algorithm


(2) Bit complexity. As the set M usually contains different messages,an amount of information must be transmitted in every messageto identify it in M . As the set M usually contains different mes-sages, an amount of information must be transmitted in everymessage to identify it in M. For an algorithm that uses a smallset M each message can be identified using only a small num-ber of bits, while algorithms using many different messages re-quire more bits in each message. As ‘long’ messages are moreexpensive to transmit than ‘short’ messages, one may also countthe total number of bits contained in messages.Most of the algorithms in this book use messages that containO(log(N)) bits (where N is the number of processes), so their bitcomplexity exceeds their message complexity by a logarithmicfactor. In most cases only the message complexity of algorithmswill be analyzed, and the bit complexity will be computed onlyfor algorithms using very long or very short messages.

(3) Time complexity. As this model of distributed algorithms doesnot contain a notion of time it is not obvious how the time com-plexity of distributed algorithms can be defined. Different def-initions are found in the literature. The definitions are foundused in this book is based on an idealized timing of the eventsof a computation according to the following assumptions.

(a) The time for processing an event is zero time units(b) The transmission time (i.e. the time between sending and

receiving a message) is at most one time unit.

The time complexity of an algorithm is the time consumed by acomputation, under these assumptions. Note that the assump-tions are only made for the purpose of defining the time com-plexity of the algorithm. The correctness of an asynchronousalgorithm must be proved independently of these assumptions.

(4) Space complexity. The space complexity of an algorithm equalsthe amount of memory needed in a process of executed it. Thespace in a process is the logarithm of the number of states ofthat process.As the operation of distributed algorithms is non-deterministic,an algorithm may give rise to several computations for which


these measures may not be equal. Therefore a distinction be-tween worst-case and average-case complexity is made. Theworst-case measure is the highest complexity of any computa-tion of the algorithm. The average case, as implied, averagesover all possible computations, but in order to do so a probabil-ity distribution over all computations must be defined.

2.3 Architecture

Depending on the complexity of the tasks performed by subsystems (nodes)of a distributed system may require that this subsystem is designed in astructured way. Software for implementing this kind of distributed net-works usually structured in dependent modules, each performing a spe-cific function and relying on services offered by other modules. The mod-ules are called layers or levels in the context of network implementation.See figure 2.2. Defining the number of layers and the interfaces is impor-tant when designing a network. For a complected network, the architec-ture also includes the accompanying definitions of all interfaces and pro-tocols, it is important that products of different that products of differentcompany are compatible.

2.4 Language support

To carry on a distributed application, no matter for a actual physicallydistributed environment or simulation using software on a single com-puter requires the distributed algorithm coded in a programming lan-guage. Here I describe three main constructs needed for distributed pro-gramming language. The language I am using in this report is Python.A language of this must provide the means to express parallelism, pro-cess interaction, and non-determinism. Parallelism is required to programnodes of distributed system to execute their part of program concurrently.Communication between the nodes must also be supported by the pro-gramming language, so that nodes can interact with each other by meansof exchanging messages. None-determinism allow nodes to receive a mes-sage from different nodes.


Layer k Layer k

Layer 2

Layer 1

Layer 0

Layer 2

Layer 1

Layer 0 Hardware

Applications

between layers

Interface

Figure 2.2: A layered network architecture

1. Parallelism. Parallelism is usually expressed by defining several pro-cesses, each being a sequential entity with its own state space. Alanguage may either offer the possibility of statically defining a col-lection of processor or allow the dynamic creation and terminationof processes. It is also possible to express parallelism by means ofparallel statements or in a functional programming language.

2. Communication. Communication between processes is inherent todistributed algorithms: if processes do not communicate, each pro-cess operates in isolation from other processes and should be stud-ied in isolation, not as part of a distributed system. In languages thatprovide message passing, ‘send’ and ‘receive’ operations are avail-able. Communication takes places by the execution of the sent op-eration in one process (therefore called the sender process). The ar-guments of the send operation are the receiver’s address and addi-tional data. Forming the content of the message. The additional databecomes available to the receiver when the receive statement is exe-cuted , which implements the synchronization.

Messages can be sent point-to-point, i.e., from one sender to one re-


ceiver, or broadcast, in which case the same message is received byall receivers. The term multicast is also used to refer to messages thatare sent to a collection of (not necessary all) processes.

An alternative for message passing is the use of share memory forcommunication; one process writes a value to a variable, and anotherprocess reads the value.

3. Non-determinism At many points in its execution a process may beable to continue in various different ways. A receive operations is of-ten non-deterministic because it allows the receipt of messages fromdifferent senders. Additional ways to express non-determinism arebased on guarded commands. A guarded command in its most gen-eral form is a list of statements, each preceded by a boolean expres-sion (its guard). The process may continue its execution which any ofthe statements for which the corresponding guard evaluates to true.A guard may contain a receive operation, in which case it evaluatesto true if there is a message available to be received.

In general, four steps are involved in performing a computationalproblem in parallel. The first step is to understand the nature ofcomputations in the specific application domain. The second stepinvolves designing a parallel algorithm or parallelizing the existingsequential algorithm. The third step is to map the parallel algo-rithm in a suitable parallel computer architecture, and the last stepinvolves writing the parallel program utilizing an applicable parallelprogramming approach.

Chapter 3

Basic algorithms and relatedworks

As a first approach to the graph center problem three basic algorithmswill be discussed in this chapter, for general networks. These algorithmswas carefully chosen for two reasons. First of all, they are asynchronous,and apply to arbitrary network. For distributed systems existing algo-rithms usually cover only particular types of networks and topologies.The second reason is because of their simplicity, we could compute andtest them on small networks for better understand. As an additional rea-son to choose the leader election algorithm is because it terminate itself. Inthe end of this chapter some other related works will be provided.

3.1 Test connectivity

The problem that we treat in this section is the problem of discovery, byeach node in N , of the identifications of all the other nodes to which it isconnected by a path in G. The relevance of this problem becomes appar-ent when we consider of the practical situations in which portions of G

may fail, possibly disconnecting the graph and thereby making unreach-able form each other a pair of nodes that could previously communicateover a path of finite number of edges. The ability to discover the iden-tifications of the nodes that still share a connected component of systemin an environment that is prone to such changes my be crucial in many

24 Basic algorithms and related works

cases. The algorithm that we present in this section is not really suited tothe case in which G changes dynamically. The treatment of such cases re-quires techniques that are altogether absent from this book where we takeG to be fixed and connected.

The algorithm is called A Test Connectivity, and its essence is the fol-lowing. First of all, it may be started by any of the nodes in N , eitherspontaneously (if the nodes is in N0) or upon receipt of the first message(otherwise). In either case, what a node ni does to initiate its participationin the algorithm is to broadcast its identification, call it idi, As we will see,this very simple procedure, coupled with the assumption that the edges inG are FIFO, suffices to ensure that every node in N obtains the identifica-tions of all the other nodes in G.

The set of variables that node ni employs to participate in this algo-rithm includes parentji , countji , reachedj

i , initiatedji for nj ∈ N

parentji , initialized to nil, indicates the node the node in Neighi fromwhich the first idj has been received.

countji ,initialized to zero, stores the number of times idj has received.

reachedji , initialized to false, is used to indicate whether idj has been

received at least once.

initiatedji , initialized to false, is employed at ni to indicated whether

ni ∈ N0.

This algorithm is based the assumption that G’s edges are FIFO. Thewave that node nj propagates forward with its identification reaches everyother node nj either when initiatedj = true or when initiatedj = false.By Actions, and because of the FIFO property of the edges, in either caseidj is only sent along the nodes on the path from nj to ni obtained bysuccessively following the parent pointers after idj from all of its neighborsit has already received idj once. Because this is valid for all nj ∈ N , thennj must by this time know the identifications of all nodes in G.


Algorithm A Test Connectivity:

. Variables:

parentki = nil for all nk ∈ N ;reachedk

i =countki = 0 for all nk ∈ N ;initiatedk

i = false;

. Input:

msgi = nil

Action:

if ni ∈ N0:

initiatedki = true;

reachedki = false;

Send idi to all nj ∈ Neigi.

. Input:

msgi = idk such that origini(msgi) = (ni, nj) for some nk ∈ N .

Action:

if not initiatedi then

initiatedi := true;

reachedi := true;

Send idi to all nl ∈ Neihi.

countki := countki + 1;

if not reachedki then

reachedki := true;

parentki := nj ;

Send idk to every nl ∈ Neighi such that nl 6= parentki .

if countki = |Neighi| then

if parenti 6= nil then

Send idk to parentki .


The complexity of Algorithm A Test Connectivity is O(nm) (to be pre-cise, each edge carries exactly n messages in each direction, so the to-tal number of messages is 2nm). Because the lengths of messages de-pend upon n, it is in this case appropriate to compute the algorithm’sbit complexity as well. If we assume that every node’s identification canbe expressed in blog nc bits, then the bit complexity of this algorithm isO(nm log n).

3.2 All-pair shortest path

Another basic problem considered in this section is the problem of deter-mining the shortest distances in G between all pairs of nodes. Distancesbetween two nodes are in this section taken to be measured in numbersof edges, at the end a node be informed not only of the distance from itto all other nodes, but also of which of its neighbors lies on the corre-sponding shortest path, the availability of this information at all nodesprovides a means of routing messages form every node to every othernode along shortest paths. When G has one node for every processor ofsome distributed-memory system and its edges reflect the interprocessorconnections in that system, this information allows shortest-path routingto be done.

An asynchronous algorithm A Compute Distances is provided here. It iswidely used, despite having been displaced by more efficient algorithmsof various theoretical interest. In addition to its popularity, good reasonsfor me to present it in detail are its simplicity.

The variables this algorithm employs includes distji , firstji , seti, levelji ,statei, initiatedi idi, denotes ni’s identification.

distji denotes the shortest distance form ni to nj ∈ N , initially equalto n (unless j = i, in which case the initial value is zero)

firstji denotes the node in Neighi on the corresponding shortest pathto nj 6= ni, initially equal to nil. The set of identifications to be sentout to neighbors at each step is denoted by seti; initially, it containsnj’s identification only.


seti, denote he set of identifications to be sent out to neighbors ateach step, initially it contains ni’s identification only.

levelji , denotes which set of node identifications ni has received fromnj . Specifically, levelji = d for some d such that 0 ≤ d < n if and onlyif ni has received from nj the identifications of those nodes which ared edges away from nj . Initially equal to -1.

statei, initiatedi is employed by nj with the following meaning. Nodeni has received the identifications of all nodes that are d edges awayfrom it for some d such that 0 ≤ d < n if and only if statei = d.Initially equal to 0.

initiatedi initially set to false, is used to indicate whether ni ∈ N0

Algorithm A Compute Distance:

. Variables:

distii = 0;distki = n for all nk ∈ N such that k 6= i;firstki = nil for all nk ∈ N such that k 6= i;seti = idi;levelki = −1 for all nk ∈ Neighi;initiatedi = false ;

. Input:

msgi = nil

Action:

if ni ∈ N0:

initiatedki = true;

Send seti to all nj ∈ Neighi.

. Input:

msgi = setj such that origini(msgi) = (ni, nj).

Action:


if not initiatedi then

initiatedi = true;

Send seti to all nk ∈ Neighi.

if statei then

levelji = levelji + 1;

for all idk ∈ setj do

if distki > levelji + 1

distki = levelki + 1

firstki = nj

if statei ≤ levelji for all nj ∈ Neighi then

statei = statei + 1

seti = {idk | nk ∈ N and distki = statei}Send seti to all nk ∈ Neighi

In algorithm A Compute Distances, N0 = N . If initiatedi = true, thenthe second action is only executed if statei < n − 1 . The point to noticeis that this is in accord with the intended semantics of statei, because ifstatei = n − 1 then ni has already received the identifications of all nodesin N , and is then essentially done with its participation in the algorithm.

Another important point to be discussed right away with respect toalgorithm A Compute Distances is that the FIFO property of edges, in thiscase, is essential for the semantics of the level variables to be maintained.In the second Action, the distance from ni to nk is updated to levelji + 1

upon receipt of idk in a set from a neighbor nj of ni only because that set istaken to contain the identifications of nodes whose distance to nj is levelji .This cannot be taken for granted, though, unless (ni, nj) is a FIFO edge.

The complexities of algorithm A Compute Distances can also be obtainedright away. By Actions, what node ni does is to send its identification toall of its neighbors, then the identifications of all of its neighbors get sent,then the identifications of all nodes that are two edges away from it, andso on. Thus ni sends n message to each of its neighbors, and the total num-ber of messages employed is then 2nm, yielding a message complexity of


O(nm) and a bit complexity of O(nm log n) if node identifications can berepresented in blog nc bits.

3.3 Leader election

In this section the problem of election, also called leader finding, will bediscussed. The election problem was first posed by LeLann [G.L77], whoalso proposed the first solution. The problem is to start from a configura-tion where all processes are in the same state, and arrive at configurationwhere exactly one process is in state leader and all other processes are inthe state lost.The process in state leader at the end of the computation iscalled the leader and is said to be elected by the algorithm.

An election under the processes must be held if a centralized algorithmis to be executed and there is no a priori candidate to serve as the initiatorof this algorithm. For example, this could be the case for an initializationprocedure that must be executed initially or after a crash of the system.Because the set of active processes may not be known in advance it is notpossible to assign one process once and for all to the role of leader.

Definition 3.1 An election algorithm is an algorithm that satisfies the followingproperties.

(1) Each process has the same local algorithm.

(2) The algorithm is decentralized, i.e., a computation can be initialized by anarbitrary non-empty subset of the processes.

(3) The algorithm reaches a terminal configuration in each computation, andin each reachable terminal configuration there is exactly one process in thestate leader and all the other processes are in the state lost.

The last property is sometimes weakened to require only that exactlyone process is in the state leader. It is then the case that the elected processis aware that it has won the election, but the losers are not (yet) aware oftheir loss.


3.3.1 Assumption made in this section

The election problem has been studied in this section under assumptionsthat we now review.

(1) The system is fully asynchronous. It has been assumed that the pro-cesses have no access to a common clock and that the message trans-mission times can be arbitrarily long or short.

(2) Each process is identified by a unique name, its identity, which isknown to the process initially. For simplicity it has been assumedthat the identity of process p is just p. The identities are drawn froma totally ordered set P , i.e., a relation≤ on identities is available. Thenumber of bits that represent an identity is w.

Theorem 3.1 If A is a centralized wave algorithm using m messages per wave,the algorithm Ex(A) elects a leader using at most nm messages.

Proof see section B.0.2

Lemma 3.1 Let C be a wave with one initiator p and, for each non-initiator q.Let fatherq be the neighbor of q from which q received a message in its first event.Then the graph T = (P, ET ), with ET = qr : q 6= p∧ = fatherq is a spanningtree directed towards p.

Proof see section B.0.3.

The event f in the third clause of Definition 6.1 can be chosen to be asend event for all q other than the process where the decide event takesplace.

Lemma 3.2 Let C be a wave and dp ∈ C a decide event in process p. Then

∀q 6= p : ∃f ∈ Cq : (f � dp ∧ f

is a send event)

Proof see section B.0.4.


3.3.2 Extinction and a fast algorithm

The set of variables that process p employs to participate in this algorithmincludes cawp, recp, fatherp, lrecp, winp, statep.

cawp, initialized to None, indicates the currently active wave.

recp, initialized to 0, indicates the number of tok message received byp

lrecp, initialized to None, indicates father in currently active wave.

winp, initialized to 0, indicates the number of ldr message receivedby p.

statep, initialized to sleep, indicates the status of p. If p is the leaderelected, it will be leader, otherwise, lost.

fatherp,

Algorithm A Leader elect:

. Variables:

cawp = nilrecp = 0;fatheri =nillreci = 0

wini =nilinitiatori = true

. Input:

msgp = nil

Action:

if initiatorp then

cawp = idp

Send 〈tok, idp〉 to all nj ∈ Neighi


. while lrecp < #Neighp do:

Input:

msgp = 〈ldr, rid〉

Action:

if lrecp = 0 then

lrecp = lrecp + 1

winp := idr

Send 〈ldr, idp〉 to all nk ∈ Neighi

Input:

msgp = 〈tok, idr〉

Action:

if rid < cawp then

cawi = rid

recp = 0

fatherp = q

Send 〈tok, idr〉 to all ns ∈ Neighi, ns 6= nq

if rid = cawp then

recp = recp + 1

if recp = #Neigp

if cawp = idp : send 〈ldr, idp〉 to all ns ∈ Neighi

else: send 〈tok, cawp〉 fatherp

( * if idr > cawp : ignore * )

. If winp = idp : statep = leaderelse: statep = lost

An algorithm for leader election can be obtained from an arbitrary cen-tralized wave algorithm each initiator starts a separate wave; the messagesof the wave initiated by process p must be tagged with p in order to distin-


guish them from the messages of different waves. The algorithm ensuresthat, no matter how many waves are started, only one wave will run to adecision, namely, the wave of the smallest initiator. All other waves willbe aborted before a decision can take place.

For a wave algorithm A, the election algorithm Ex(A) is as follows.Each process is active in at most one wave at a time; this wave is its cur-rently active wave, denoted caw, with initial value undefined. Initiatorsof the election act as if they initiate a wave and set caw to their own iden-tity. If a message of some wave, say the wave initiated by q, arrives at p,p processes the message as follows. If q > cawp, the message is simplyignored, effectively causing q’s wave to fail. If q = cawp, the message istreated exactly according to the wave algorithm. If q < cawp or cawp is un-defined, p joins the execution of q’s wave by resetting its variables to theirinitial values and setting cawp = q. When the wave initiated by q executesa decision event. q will be elected.

3.4 Summary of papers

The summary of some existing algorithm are summarized as follows.


Algorithms of leader electionPaper Name |n| Dyn Asyn Assumptions or others[MWV00]Leaderelectionalgorithmsfor mo-bile ad hocnetworks

not√

not Each connected component is a leader- orientedDAG originally .Maintain a unique leader for everycomponent

[NOb] Uni-form leaderelectionprotocolsfor radionetworksa

not not not Uniformb, single-hop c, single-channeld. when twoor more stations are transmitting on a channel in thesame time slot, the corresponding packages collideand are garbled beyond recognition. e

[NOa]Random-ized leaderelectionprotocolsfor Ad-hocnetworks

√not not A fast anonymous probabilistic algorithm Elect a

leader in a n-station, single-channel ad-hoc network

[NO00]Random-ized leaderelectionprotocols inradio net-works withno collisiondetection

not not not Energy-efficient randomized leader election proto-cols for single-hop, single channel radio network.That do not have the collision detection capabilities.

a A radio network is a distributed system with no center arbiter, consisting of n radiotransceivers, henceforth referred to as stations

bA leader election protocol is said to be uniform if in each time slot every stationtransmits with the same probability.

cA radio network is said to be single hop when all the station are within transmissionrange of each other.

dthe stations communicate over a unique radio frequency channel known to all thestations

eCollision detection, in the radio network with collision detection the status of a radiochannel in a time slot is, null if no station transmitted in the current time slot, single ifexactly one station transmitted in the current time slot, collision if two or more stationstransmitted the channel in the current time slot)


Routing protocols for multi-hop wireless ad hoc networkProtocol name abstract[Per97]Destination-sequenced distancevector

DSDV is a hop-by-hop distance vector routing pro-tocol requiring each node to periodically broadcastrouting updates. It guarantees loop-freedom

[GB99] Ad-Hoc on-demand distancevector routing

AODV is a combination of both DSR and DSDV,It uses the on-demand mechanism of Route Discov-ery and Route Maintenance from DSR, plus the useof hop-by-hop routing, sequence numbers, and pe-riodic beacons from DSDV.

[JM96]Dynamicsource routing

DSR uses source routing rather than hop-by-hoprouting, with each packet to be routed carrying in itsheader the complete, advantage of source routingis that intermediate nodes do not need to maintainup-to-date routing information in order to route thepackets they forward, since the packets themselvesalready contain all the routing decisions.

[PC97]Temporally-Ordered routingalgorithm

TORA Discover routes on demand, provide multi-ple routes to a destination, establish route quickly,and minimize communication overhead by local-izing algorithmic reaction to topological changeswhen possible. (shortest-path-path routing is thesecondary importance, and longer routes are oftenused to avoid the overhead of discovering newerroutes.)


Other documentsName author featureNetchangealgorithm

Tajibnapis[Taj77]

This algorithm computes routing tables that areoptimal according to the ‘minimum-hop’ measure.Maintains information that allows the tables to beupdated with only a partial recomputation after thefailure or repair of a channel. The algorithm canhandle the failure and repair or addition of chan-nels, but it is assumed that a node is notified whenadjacent channel fails or recovery of nodes is notconsidered; instead it is assumed that the failureand recovery of node is observed by its neighbors asthe failure of the connecting channel. The algorithmmaintains in each node u a table of Nbu[v], givingfor each destination v a neighbor of u to which pack-ets for v terminates after a finite number of stepsin all cases because the failure or repair of chan-nels may ask for recomputation indefinitely. The re-quirements of the algorithm are as follows.

Locationingin dis-tributedad-hoc wire-less sensornetworks

ChrisSacarese[SR], Jan M.Rabaey

The algorithms presented herein rely on range mea-surements between pairs of nodes and the a pri-ori coordinates of sparsely located anchor nodes.Clusters of nodes surrounding anchor nodes co-operatively establish confident position estimatesthrough assumptions, checks, and iterative refine-ments. Once established, these positions are propa-gated to more distant nodes, allowing the entire net-work to create an accurate map of itself.

Algorithmvisualiza-tion fordistributedenviron-ments

YoramMoses[MPT98]

Since in asynchronous distributed systems there isno way of knowing the ‘the real’ execution, a sys-tem VADE is designed to be consistent with the ex-ecution of the distributed algorithm, so that the al-gorithm runs on the server’s machines while the vi-sualization is executed on a web page on the client’smachine

Self-stabilizingalgorithmsfor findingcenters andmedians oftrees

[KPBG94] Tree graph, no initialization is assumed, the un-derlying network topology is permitted to changeunder the condition that the topology remain con-nected and acyclic. Self-stabilizing Due to a tran-sient failure, a process may temporarily have in-correct knowledge about the identification of neigh-bors, but eventually identify its neighbors correctlyand the system converge to desired state.

Chapter 4

A new algorithm for the graphcenter problem

Locating centers of graphs has a wide variety of important applicationbecause placing a common resource at a center of a graph minimizes costsof sharing the resource with other locations.

In this chapter I am going to introduce my own algorithm for the graphcenter problem. Details about center problem and definition see page 3.This is a new algorithm based on the three algorithms discussed in theprevious chapter. It is an asynchronous algorithm and apply to arbitrarynetworks. The main feature of this algorithm is its layered structure. Themain idea of this algorithm is to find the center node in a distributed envi-ronment.

4.1 Assumptions

A distributed system consists of n nodes, denoted by N0, N1, N2, . . . , N3.Nodes are identical, each have a unique name. Each node have three dif-ferent processors or CPUs, whose program is composed of atomic steps.These three processors corresponding to three different layer of the algo-rithm, see figure 4.1 for node hardware structure. The three layers arenamely test connectivity, compute eccentricity and compute center. Eachprocessor has its own buffer for message passing. The number of nodesis needed. Each node keeps information of its neighbours and they only

38 A new algorithm for the graph center problem

can communication with their neigbours directly. By saying neighbours,we mean nodes in distance within a limited length of the node. The costis the number of hops (or edges) needed to the destination. The links be-tween nodes is reliable. The links and buffers are in order of FIFO. Thereis indefinite delay of messages passing.

Node 0

Buffer 0

Processor 0

Buffer 1

Buffer 2

Processor 1

Processor 2

Figure 4.1: Node hardware architecture and naming method

4.2 Layered structure

This a algorithm is designed to find the node which has the maximumshortest distance to all the other nodes (called a MinMaxMin problem).Due to the complexity of the tasks performed by the distributed system,design a single algorithm can be very complicated. A algorithm designedin a structured way is required. The process flow graph for this algorithmis showed in figure 4.2.

Since this algorithm is designed in a distributed way, each node per-forms the same algorithm, and the tasks rely on the interaction or messagepassing of each nodes. But messages passing only happens among thesame type of processors. In the sense of the algorithm, message passingonly happens within the same layer. Figure 4.3 shows the message passingbetween two nodes. Here I called the second layer routing layer because itcomes from all-pair-shortest path problem, and routing is a second benefitafter solving center problem.


Generate arbitrary (graph generate function)

Construct node instance (init method of class)

Test connectivity (test connectivity algorithm)

Compute eccentricity (modified all pair shortest path algorithm)

Compute center (modified leader election algorithm)

If not return the connected subgraph.

Table of the first neighbour on the corresponding shortest path

Data set for simulation

Figure 4.2: Process flow graph


Center layer

Routing layer

Test layer

Center layer

Test layer

Routing layer

Node 0 Node 1 Message passing

Figure 4.3: Message passing between layers between nodes

4.3 Message structure

There exists four type of messages in this algorithm. And they are struc-tured in the same way. See figure 4.4

Sender id

Type

Data

Destination id

Figure 4.4: Use tuple message structure

As their names, the Destination id part is the address the message go-ing to. The type part indicate the type of the message. This informationfor real computing is stored in the data part. After the message finds the


destination, it would be checked; this will trigger the destination node toexecute different actions.

4.4 How does it work?

Why the algorithm is designed like this? How does this algorithm work?In this section first I will introduce the naming method and then the func-tion of the different layers.

1. Naming method. As we known distributed system communicaterely on message passing. The name of node in this kind of systemis a unique identification, so it can also act as the address. In thisalgorithm I have both nodes and processors. All of them could havetheir own name, but is this necessary? Actually not. The reason is asfollowing.

On one hand although the message is processed by processors, butthere are no message passing within nodes, that is only messagepassing across nodes, so the first address should be the name ofnodes. On the other hand the processors in the same node is notlogically distributed because they have the same network topology.

Now we have to decided after the message come to the nodes, howcan it find the right processor? Remember, each processor have itown buffer, all we need to do is to send the message to the buffercorresponding to the processor.

For ease of computing, I named the buffers and processor (they areone entity) in sequential order. And name the node as same as thefirst processor, test processor. See figure 4.1. Now we can back to seethe message structure again. Of course there are always better wayof doing things.

2. Layer function. When the message gets to the processor, the next stepis to be processed. And before this the message need to be created,that is, the data is available. This is why the algorithm is layeredand the order is acyclically designed, because they depend on thelower layer to provide information for them to process, otherwise


they needed to continue the computation, otherwise, they will staydoing nothing.

As is shown is figure 4.5. The test layer is first started, it test whetherthe network is connected, if not it will return the connected part ofthe network, since it may several partitions, I let it only return thepartition, which contains node0. When this is done. The secondlayer is started, and compute the eccentricity for each node. Thisinformation will be passed on to the next layer to compute the center.

Center layer

Routing layer

Test layer

Provide the MinMax distance to the upper layer

Provide the information of whether the network is connected to the routing layer

Figure 4.5: Message passing between layers inside the node

4.5 Modification of algorithms

Now as a example of message structure, I am going to explain how I mod-ified those algorithms described in chapter 3.

1. Leader election. The main idea of that algorithm consists in assign-ing unique ids from an ordered set to each of the nodes in the net-work and then elect leader the node with the minimum id. But outaim is to find the center of a graph, we still will give each node a


unique id, but this time the problem will be compare the maxmindistance. As I said in the message structure section, because the ac-tual computation information is stored in the data part.

In the leader election algorithm there exist two different type of mes-sages, tok and ldr. Both of carry node identification, because findthe node with the node with smallest id. If I want to find the center,I need to find the node with smallest eccentricity. So first I give thenode a new variable di, initially set to 0, indicates, the eccentricityof the node. Then I change those part, where the computation makedecision(at the time ldr message is sent). The details of the changesis as following:

(1) Action:

if initiatorp then

cawp = idp

Send 〈tok, idp〉 to all nj ∈ Neighi

this is modified toAction:

if initiatorp then

cawp = dp

Send 〈tok, dp〉 to all nj ∈ Neighi

Reason: This will let the nodes broadcast the information whichcontains the value of eccentricity.

(2) if cawp = idp : send 〈ldr, idp〉 to all ns ∈ Neighi

is changed to

if cawp = dp : send 〈ldr, idp〉 to all ns ∈ Neighi

Reason: The computation here is need to make the decision,since it is going to send the ldr message, so change it to let thenode make the decision according to the eccentricity.

(3) Ifwinp = idp : statep = leaderis changed toIfwinp = dp : statep = leader


Reason: The variable win contains the winner’s information,this is a big decision. Change it to return the smallest eccentric-ity. Compare this value with self’s eccentricity, if it is the same,then win.

2. Shortest pathFor this algorithm I changed it to return the maximumshortest distance instead of only shortest distance matrix. This is acomputation done locally, I will not discuss it in details here.

4.6 Technique

The language I am used for this project is Python; python is a high-levelobject-oriented language, and it is programmed using Linux. List of mainalgorithm could be find in the appendix. Python is easy to use and veryflexible. You can find most useful information from its official websitewww.python.org . Next I am going to introduce some main issues of howto use python to change the algorithm into program. For more examples,see Appendix A.

1. Using class to construct node. Python is a objected oriented lan-guage, this make it possible to make new object. Class is the modeldoing this job.

2. Using thread for parallel computing . Python provide a parallel com-puting module called thread. It allow several processes to run to-gether to solve the problem. This make it possible to simulate per-formance of a distributed system on one computer.

3. Use tuple as message structure. I mentioned the message structurein former section, See figure 4.4. This could be done in Python usingtuples. Just write it as (destination.id, (′type′, data), sender.id).

4. Use list as stack. List is another example, the list methods make itvery easy to use as a list as a stack, where the last element added isthe first element retrieved (FIFO), To add an item to the top of thestack, use append(). To retrieve an item from the top of the stack, usepop(0). As is showed in figure 4.6

www.python.org


New

Waiting

Waiting

Waiting

Waiting

Waiting

Waiting

Checking

Append

After checking

Pop out

Old

Input

Figure 4.6: Use list message structure

4.7 Generating large graphs

As you maybe noticed the network used in this program is generated bythe automatically by the program. This is not part of the algorithm, but itis there to make the program a complete implementation. See code lines191-221 on page 60.

The idea is first I generate the network as a empty set. When constructinstance of Node class. Just choose neighbours (a subset) from the exist-ing set, so that a new instance is constructed with neighbours information.This subset may be empty. And this is want we want. Figure 4.7 is gener-ated by this method.

4.8 How to use the code?

This program is a complete simulation of the graph center algorithm I in-troduced in this chapter. The execution of the program gives four dot filesand the result at the same time. In this section I just simply introduce howto use it. The output of the program can be found in the section follows.

1. The latest version of Python is 2.3. You can download it from its


0 Centre

3 remote

6 remote9 remote

12 remote

30 remote

33 remote 42 remote

45 remote

51 remote57 remote

60 remote

27 remote

36 remote

39 remote

48 remote 18 remote

21 remote

24 remote54 remote

15 remote

Figure 4.7: Example network with a center


official website www.python.org , and install it according to the in-struction.

(a) If using platform of Windows, open the file use IDLE(PythonGUI) then under run choose run script, the result will pop upfrom an other window, this program have a dot 1program partwith it, it will not work under windows systems, so the last lineof this program need to be switch off before run.

(b) If using Linux, just type the file name from the console, the pro-gram will run and give the result of the computation, with adot file generated in the end, follow the command line, you canopen the dot file, which give you the graph the program cur-rently working on. The file is automatically generated, and it israndom. So it will be rewritten by running the program again.A program with curses output for the all-pair shortest path anda program with data set generating could be found in the nextsection.

(c) Change the parameter. If you would like to change the param-eter to see node you can change the parameter k, but I shouldmention that in the first two algorithm there are no automati-cally termination available at this moment, so if you do a verylarge number, you may see the network is not connected, butthis does not mean it is really not connected but not enough it-eration has been set. So you need to set the parameter m to alarger number to see the right answer at the same time. Thesame apply to the second part of the algorithm.

1GraphViz provides a collection of tools for manipulating graph structures and gen-erating graph layouts. Example applications:

– Software documentation: Pretty diagrams automatically generated by doxygen anddot. (From GraphViz source documentation.)

– WWW Graph Server: For a WWW application of GraphViz, please see http://

www.graphviz.org/

www.python.org

http://www.graphviz.org/

http://www.graphviz.org/


4.9 Sample results

The graphs generated by the dot program part of the code, see figure 4.8

0 True

3 True

9 True 12 True

15 True

6 False 0

3

9 12

15

0 maxminpath=3

3 maxminpath=2

9 maxminpath=3 12 maxminpath=2

15 maxminpath=3

0 remote

3 Centre

9 remote 12 Centre

15 remote

Figure 4.8: Process of the program: top left: Result of connectivity; topright: Return the connected partition; bottom left: Compute the eccentric-ity; bottom right: Compute the center


The result corresponding to these graphs is generated in the same timeas following:

n0=Node([])

n1=Node([n0,])

n0.add_link(n1)

n2=Node([])

n3=Node([n0,n1,])

n0.add_link(n3)

n1.add_link(n3)

n4=Node([n1,])

n1.add_link(n4)

n5=Node([n4,])

n4.add_link(n5)

nodes=[n0,n1,n2,n3,n4,n5]

node0.reached {0: T, 3: T, 6: F, 9: T, 12: T, 15: T}


node6.reached {0: F, 3: F, 6: T, 9: F, 12: F, 15: F}




nodes [0, 3, 6, 9, 12, 15]

nodes [0, 3, 9,12, 15]

The all-pair shortest distance matrix

0 {0: 0, 9: 1, 3: 1, 12: 2, 15: 3}

3 {0: 1, 9: 1, 3: 0, 12: 1, 15: 2}

9 {0: 1, 9: 0, 3: 1, 12: 2, 15: 3}

12 {0: 2, 9: 2, 3: 1, 12: 0, 15: 1}

15 {0: 3, 9: 3, 3: 2, 12: 1, 15: 0}

Table of first neighbor corresponding to shortest path

_________________________________________________

0 | {0: None, 9: 9, 3: 3, 12: 3, 15: 3 }

|


1 | {0: 0, 9: None, 3: 3, 12: 3, 15: 3 }

|

2 | {0: 0, 9: 9, 3: None, 12: 12, 15: 12 }

|

3 | {0: 3, 9: 3, 3: 3, 12: None, 15: 15 }

|

4 | {0: 12, 9: 12, 3: 12, 12: 12, 15: None}

0 is remote 3 is Centre

9 is remote 12 is Centre

15 is remote

run "dot -Tps bf1.dot | gv -" to see connectivity

run "dot -Tps bf2.dot | gv -" to see partition contains n0

run "dot -Tps bf3.dot | gv -" to see maxminpath

run "dot -Tps bf4.dot | gv -" to see center

A sample data set for graphic interface.

4.10 Estimation and Future works

This algorithm solved the graph center problem for distributed environ-ment. When building such systems, including asynchrony, we have tocope with two other main issues; termination and failure occurrence.

Failure occurrences cannot be predicted. The net effect of asynchronyand failure occurrences actually create an uncertainty on the state of theapplication (as perceived by a process) that can make very difficult or evenimpossible to determine a system view that can be validly shared by allnon-faulty processes. The mastering of such an uncertainty is one of themain problems that designers of asynchronous systems have to solve.

Detecting termination of a distributed algorithm : In the of the algo-rithms considered in chapter 3, there are situations where the computationcan naturally be viewed as terminated. For example, the leader electioncomes to an end when the leader is elected . Here we describe an methodfor detecting termination.


We consider situations where during execution of the algorithm, eachprocessor is able to monitor its own computations and decide whether acertain ‘local termination condition’ holds. if the local termination con-dition holds at some processor, then no messages can be transmitted bythat processor. Further more, once true, the local termination conditionremains true until a message from some other processor is received.

Termination has occurred at some time t if:

(a) The local termination condition holds at all processors at time t.

(b) No message is in the transit along any communication link at time t.

Termination occurs at time t, if t is the smallest time t for which theabove conditions (a) and (b) hold. Our objective is to detect the termina-tion within finite time after it occurs. Notice that if termination has oc-curred at some time t, then the same is true for every subsequent timet′ > t, since no messages will be transmitted after time t and local termi-nation condition will remain true at all processors.


0 | [(0.00079798698425292969, 'change', ['caw', 0]), ((0.00085890293121337891, 'send', 1), ('tok', 0, 0)), (0.013808012008666992, 'check'), ((0.01385200023651123, 'savemsg'), ('tok', 1, 1)), (0.01391899585723877, 'ignore', ('tok', 1, 1)), (0.21377801895141602, 'check'), ((0.21382296085357666, 'savemsg'), ('tok', 0, 1)), (0.2138969898223877, 'change', ['rec', 1]), ((0.21396398544311523, 'send', 1), ('ldr', 0, 0)), (0.23376691341400146, 'check'), ((0.23381400108337402, 'savemsg'), ('ldr', 0, 1)), ((0.23388791084289551, 'send', 1), ('ldr', 0, 0)), (0.23394596576690674, 'change', ['lrec', 1]), (0.23398900032043457, 'change', ['win', 0]), (0.23405098915100098, 'change', ['state', 'Leader'])] 1 | [(0.0013449192047119141, 'change', ['caw', 0]), ((0.0014050006866455078, 'send', 0), ('tok', 1, 1)), ((0.001459956169128418, 'send', 2), ('tok', 1, 1)), ((0.0015159845352172852, 'send', 3), ('tok', 1, 1)), ((0.0015699863433837891, 'send', 4), ('tok', 1, 1)), (0.014109015464782715, 'check'), ((0.014149904251098633, 'savemsg'), ('tok', 0, 0)), (0.014213919639587402, 'change', ['caw', 0]), (0.014256954193115234, 'change', ['rec', 4]), (0.014297008514404297, 'change', ['father', 0]), ((0.014360904693603516, 'send', 2), ('tok', 0, 1)), ((0.014420986175537109, 'send', 3), ('tok', 0, 1)), ((0.014482975006103516, 'send', 4), ('tok', 0, 1)), (0.014547944068908691, 'change', ['rec', 4]), (0.033867001533508301, 'check'), ((0.033913016319274902, 'savemsg'), ('tok', 2, 2)), (0.033980011940002441, 'ignore', ('tok', 2, 2)), (0.053878903388977051, 'check'), ((0.053924918174743652, 'savemsg'), ('tok', 4, 4)), (0.053990960121154785, 'ignore', ('tok', 4, 4)), (0.073873996734619141, 'check'), ((0.073920011520385742, 'savemsg'), ('tok', 3, 3)), (0.073985934257507324, 'ignore', ('tok', 3, 3)), (0.11387395858764648, 'check'), ((0.11391901969909668, 'savemsg'), ('tok', 0, 3)), (0.1139909029006958, 'change', ['rec', 4]), (0.13386595249176025, 'check'), ((0.13391292095184326, 'savemsg'), ('tok', 0, 2)), (0.13398492336273193, 'change', ['rec', 4]), (0.1938709020614624, 'check'), ((0.19391500949859619, 'savemsg'), ('tok', 0, 4)), (0.19398891925811768, 'change', ['rec', 4]), ((0.19405090808868408, 'send', 0), ('tok', 0, 1)), (0.21417498588562012, 'check'), ((0.21422100067138672, 'savemsg'), ('ldr', 0, 0)), ((0.21429693698883057, 'send', 0), ('ldr', 0, 1)), (0.21435296535491943, 'change', ['lrec', 4]), (0.21439599990844727, 'change', ['win', 0]), ((0.21444392204284668, 'send', 2), ('ldr', 0, 1)), (0.21449697017669678, 'change', ['lrec', 4]), (0.21454095840454102, 'change', ['win', 0]), ((0.21458995342254639, 'send', 3), ('ldr', 0, 1)), (0.21464300155639648, 'change', ['lrec', 4]), (0.21468591690063477, 'change', ['win', 0]), ((0.21473395824432373, 'send', 4), ('ldr', 0, 1)), (0.21478700637817383, 'change', ['lrec', 4]), (0.21483099460601807, 'change', ['win', 0])] 2 | [(0.0020499229431152344, 'change', ['caw', 0]), ((0.0021100044250488281, 'send', 1), ('tok', 2, 2)), ((0.0021649599075317383, 'send', 7), ('tok', 2, 2)), (0.014743924140930176, 'check'), ((0.014789938926696777, 'savemsg'), ('tok', 1, 1)), (0.014853954315185547, 'change', ['caw', 0]), (0.014894008636474609, 'change', ['rec', 2]), (0.014932990074157715, 'change', ['father', 1]), ((0.014997959136962891, 'send', 7), ('tok', 1, 2)),

Figure 4.9: Sample data set

Appendix A

Program listings

A.1 Test connectivity

#!/usr/local/bin/python

from threading import *from time import sleepfrom sys import stdin,stderr,exitfrom copy import copyfrom random import choice

class Node:10

def init (s,neig):s.initiated=Falses.t=Thread(target=s.a test connectivity)s.id=’N’ +str(int(s.t.getid()[7:])−1)s.neig=neigs.msgq=[ ]

def getid(s):return s.id

20

def send(node,msg):node.msgq.append(msg)

def repr (s):return s.id

54 Program listings

def add link(s,node):s.neig.append(node)

def a test connectivity(s): 30

m=100while m:

m−=1sleep(0.01)if not s.msgq:

s.intiated=Trues.reached[s.id]=Truefor node in s.neig:

Node.send(node,(s.id,s.id))else: 40

if s.msgq:nj,nk=s.msgq.pop(0)if not s.initiated :

s.initiated=Trues.reached[s.id]=Truefor node in s.neig:

Node.send(node,(s.id,s.id))s.count[nk]+=1if not s.reached[nk]:

s.reached[nk]=True 50

s.parent[nk]=njfor node in s.neig:

if not node.id==nk:Node.send(node,(s.id,nk))

if s.count[nk]==len(s.neig):if s.parent[nk]:

if node.id==s.parent[nk]:Node.send(node,(s.id,nk))

def start(s): 60

s.parent=dict([(x.id,None) for x in nodes])s.count=dict([(x.id,0) for x in nodes])s.reached=dict([(x.id,False) for x in nodes])s.t.start()

nodes=[n0,n1,n2,n3,n4,n5,n6]No=[n0]for node in nodes: node.start()

while 1: 70

55 Program listings

alive=0for node in nodes: alive+=node.t.isAlive()print >>stderr,’%d threads alive’ %aliveif alive==0: breaksleep(1)

print ’n0.reached’ ,n0.reached

80

A.2 Shortest path


from threading import *from time import sleepfrom sys import stdin,stderr,exitfrom copy import copyfrom set import Set

class Node:10

def init (s,neighbours=[ ]):

s.t=Thread(target=s.a compute distances)s.name=’N’ +str(int(s.t.getName()[7:])−1)s.set=Set([s.name])s.state=0s.initiated=Falses.neighbours=copy(neighbours)s.msgq=[ ]

20

def getName(s):return s.name

def repr (s):return s.name

def send(node,msg): # send messagesnode.msgq.append(msg)

56 Program listings

def add link(s,node): # add link to node 30

s.neighbours.append(node)

def a compute distances(s):m=100000n=len(nodes)while m:

m−=1sleep(0.001)if not s.msgq:

if s in No: 40

s.initiated=Truefor node in s.neighbours:

Node.send(node,(s.name,(s.set)))else:

nj,setj=s.msgq.pop(0)if not s.initiated:

s.initiated=Truefor node in s.neighbours:

Node.send(node,(s.name,(s.set)))if s.state<n−1: 50

if 1:s.level[nj]+=1for nk in setj:

if s.dist[nk]>s.level[nj]+1:s.dist[nk]=s.level[nj]+1s.first[nk]=nj

x=Truefor nei in s.neighbours:

x=x and s.state<=s.level[nei.name]if x: 60

s.state+=1s.set=Set([ ])for nk in nodes:

if s.dist[nk.name]==s.state:s.set.insert(nk.name)

s.bfr=copy(s.set)for nb in s.neighbours:Node.send(nb,(s.name,(s.set)))

def start(s):n=len(nodes) 70

s.dist=dict([(x.name,n) for x in nodes])s.dist[s.name]=0

57 Program listings

s.first=dict([(x.name,None) for x in nodes])del s.first[s.name]s.level=dict([(x.name,−1) for x in s.neighbours])s.t.start()

n0=Node([ ])n1=Node([n0]) 80

n2=Node([n1])n3=Node([n1])n4=Node([n1,n3])n5=Node([n4])n6=Node([n5])n7=Node([n2])n0.add link(n1)n1.add link(n2)n1.add link(n3)n1.add link(n4) 90

n2.add link(n7)n3.add link(n4)n4.add link(n5)n5.add link(n6)nodes=[n0,n1,n2,n3,n4,n5,n6,n7]No=[n0,n1,n2,n3,n4,n5]for node in nodes: node.start()

while 1:alive=0 100

for node in nodes: alive+=node.t.isAlive()print >>stderr,’%d threads alive’ %aliveif alive==0: breaksleep(1)

print ’ ’ ,for node in nodes: print ’%s ’ %node.name,printprint ’ ’ ,33*’-’for node in nodes: 110

print node.name,’|’ ,for k,v in node.dist.items(): print ’%3d’ %int(v),print

58 Program listings

A.3 Leader election


from threading import *from time import sleepfrom sys import stdin,stderr,exitfrom copy import copyfrom math import log, floorfrom random import randrange, choicefrom time import time

10

class Node:

def init (s,neigh,d,initiator=True):

s.caw=Nones.rec=0s.father=Nones.lrec=0s.win=Nones.d=d 20

s.initiator=initiators.state=[’state’ ,’sleep’ ]s.neigh=neighs.th=Thread(target=s.extinct echo center)s.id=int(s.th.getName()[7:])−1s.num=0s.rmsg=Nones.buffer=[ ]

def send(s,d,msg): 30

d.buffer.append(msg)

def check(s):s.rmsg=s.buffer.pop(0)return s.rmsg

def start(s):s.th.start()

def getName(s): 40

return s.id

59 Program listings

def repr (s):return str(s.id)

def add link(s,z):s.neigh.append(z)

def extinct echo center(s):addr=dict([(node.id,node) for node in nodes]) 50

s.num=len(s.neigh)if s.initiator:

s.caw=s.dfor q in s.neigh:

s.send(q,(’tok’ , s.d, s.id))while s.lrec<s.num:

sleep(0.01+choice((0.001, 0.002, 0.003)))if s.buffer:

name,r,qid=s.check()if name==’ldr’ : 60

if s.lrec==0:for q in s.neigh:

s.send(q,(’ldr’ , r, s.id))s.lrec+=1s.win=r

else:if r<s.caw:

s.caw=rs.rec=0s.father=qid 70

for p in s.neigh:if not p.id==qid:

s.send(p,(’tok’ , r, s.id))if r==s.caw:

s.rec+=1if s.rec==s.num:

if s.caw==s.id:for p in s.neigh:

s.send(p,(’ldr’ ,s.d,s.id))else: s.send(addr[s.father],(’tok’ ,s.caw,s.id)) 80

if s.win==s.d:s.state=’Centre’

else: s.state=’remote’

n0=Node([ ],4)n1=Node([n0],3)

60 Program listings

n2=Node([n1],4)n3=Node([n1],3)n4=Node([n1,n3],3)n5=Node([n4],4) 90

n6=Node([n5],5)n7=Node([n2],5)

n0.add link(n1)n1.add link(n2)n1.add link(n3)n1.add link(n4)n2.add link(n7)n3.add link(n4)n4.add link(n5) 100

n5.add link(n6)nodes=[n0,n1,n2,n3,n4,n5,n6,n7]

for node in nodes: node.start()t0=time()while 1:

alive=0for node in nodes: alive+=node.th.isAlive()print >>stderr,’%d threads alive’ %aliveif alive==0: break 110

sleep(1)

for node in nodes:print node.id,’is’ ,str(node.state)

A.4 The full code for my center algorithm

#!/usr/local/bin/python# three algorithm combined togeter

from threading import *from time import *from sys import stdin,stderr,exitfrom copy import copyfrom set import Setfrom random import randrange, choice

61 Program listings

from math import log, floor 10

t0=time()

class Node:

def init (s,neigh,initiator=True):

s.th=Thread(target=s.extinct echo center)s.caw=None 20

s.rec=0s.father=Nones.lrec=0s.win=Nones.initiator=initiators.status=’sleep’

s.id=int(s.th.getName()[7:])−1s.num=0s.buffer1=[ ]s.rmsg1=None 30

s.t=Thread(target=s.a compute distances)s.set=Set([s.id])s.state=0s.initiated=Falses.buffer=[ ]s.rmsg=Nones.neigh=neighs.d=Nones.initiated1=Falses.thh=Thread(target=s.a test connectivity) 40

s.buffer2=[ ]

def start2(s):s.parent= dict([(x.id,None ) for x in nodes])s.count= dict([(x.id,0 ) for x in nodes])s.reached=dict([(x.id,False) for x in nodes])s.thh.start()

def start(s): 50

s.dist=dict([(x.id,n) for x in nodes])s.dist[s.id]=0s.d=n

62 Program listings

s.first=dict([(x.id,None) for x in nodes])s.level=dict([(x.id,−1) for x in s.neigh])s.t.start()

def start1(s):s.th.start()

60

def getid(s):return s.id

def repr (s):return str(s.id)

def check(s):s.rmsg=s.buffer.pop(0)return s.rmsg

70

def check1(s):s.rmsg1=s.buffer1.pop(0)return s.rmsg1

def send(s,d,msg):d.buffer.append(msg)

def send1(s,d,msg):d.buffer1.append(msg)

80

def send2(node,msg):node.buffer2.append(msg)

def add link(s,z):s.neigh.append(z)

def initiate(s):if not s.buffer:

s.initiated=Truefor node in s.neigh: 90

s.send(node,((s.set),s.id))

def a test connectivity(s):m=150while m:

m−=1sleep(0.01)

63 Program listings

if not s.buffer2:#if s in No:s.initiated1=True 100

s.reached[s.id]=Truefor node in s.neigh:

Node.send2(node,(s.id,s.id))else:

if s.buffer2:nj,nk=s.buffer2.pop(0)if not s.initiated1 :

s.initiated1=Trues.reached[s.id]=Truefor node in s.neigh: 110

Node.send2(node,(s.id,s.id))s.count[nk]+=1if not s.reached[nk]:

s.reached[nk]=Trues.parent[nk]=njfor node in s.neigh:

if not node.id==nk:Node.send2(node,(s.id,nk))

if s.count[nk]==len(s.neigh):if s.parent[nk]: 120

if node.id==s.parent[nk]:Node.send2(node,(s.id,nk))

def a compute distances(s):m=80while m:m−=1sleep(0.001)s.initiate()if s.buffer: 130

setj,nj=s.check()if not s.initiated:

s.initiated=Truefor node in s.neigh: s.send(node,((s.set),s.id))

if s.state<n−1:s.level[nj]+=1for nk in setj:

if s.dist[nk]>s.level[nj]+1:s.dist[nk]=s.level[nj]+1s.d=max(s.dist.values())+choice((0.2,0.4,0.3,0.8,0.9,\ 140

0.6,0.7,0.5,0.11,0.22,0.33,0.55,0.66,0.77))

64 Program listings

s.first[nk]=njx=Truefor nb in s.neigh:

x=x and s.state<=s.level[nb.id]if not x: break

if x:s.state+=1s.set=Set([ ])for nk in nodes: 150

if s.dist[nk.id]==s.state: s.set.insert(nk.id)for nb in s.neigh: s.send(nb,((s.set),s.id))

def extinct echo center(s):addr=dict([(node.id,node) for node in nodes])s.num=len(s.neigh)if s.initiator:

s.caw=s.dfor q in s.neigh:

s.send1(q,(’tok’ , s.d, s.id)) 160

while s.lrec<s.num:sleep(0.01+choice((0.001, 0.002, 0.003)))if s.buffer1:

name,r,qid=s.check1()if name==’ldr’ :

if s.lrec==0:for q in s.neigh:

s.send1(q,(’ldr’ , r, s.id))s.lrec+=1s.win=r 170

if name==’tok’ :if r<s.caw:

s.caw=rs.rec=0s.father=qidfor p in s.neigh:

if not p.id==qid:s.send1(p,(’tok’ , r, s.id))

if r==s.caw:s.rec+=1 180

if s.rec==s.num:if s.caw==s.d:

for p in s.neigh:s.send1(p,(’ldr’ ,s.d,s.id))

else:

65 Program listings

#if s.father:s.send1(addr[s.father],(’tok’ ,s.caw,s.id))

if s.win==s.d:s.status=’Centre’

else: s.status=’remote’ 190

k=20nodes=[ ]i=0x=’’

x+= ’n0=Node([ ])\n’

for i in range(k):nodes.append(i)i+=1j=[choice((node,None)) for node in nodes] 200

while None in j:j.remove(None)

while j==[ ]:j=[choice((node,None)) for node in nodes]while None in j:

j.remove(None)else:

x+= ’n%d=Node([’ %ifor node in j:

x+=’n%d,’ %node 210

x+=’])\n’

for node in j:x+=’n%d.add_link(’ %nodex+= ’n%d)\n’ %i

x+= ’nodes=[’

for n in nodes:x+= ’n%d,’ %(n)

x+= ’n%d]’ %kprint x 220

exec(x)

n=len(nodes)

def dot(nodes):" make dot file for drawing graph. Use ’dot2ps bf.dot | gv -’ "

f=open(’bf.dot’ ,’w’ )

66 Program listings

f.write(’/* automatically generated by a_center2.py */\n’ ) 230

f.write(’graph G {\n’ )f.write(’size="7,8"; ration=compress; \n’ )f.write(’node [size="0.01"]\n’ )f.write(’ edge [color=blue];\n graph [fontcolor=green,fontsize=18];\n ’ )f.write(’ center=true;\n nodesep=0.05;\n margin="0.1,0.1";\n’ )for n in nodes:

f.write(’ %d [shape=ellipse,color=%s,style=filled,label="%d %s"];\n’ \%(n.id,’yellow’ ,n.id,n.status))src=n.idfor x in n.neigh: 240

f.write(’ %d--%d [color=%s];\n’ %(src,x.id,’red’ ))x.neigh.remove(n) # we don’t want to see two links

f.write(’ }\n’ )f.close()print >>stderr,’bf.dot written - now run "dot -Tps bf.dot | gv -"’

for node in nodes: node.start2()while 1:

alive2=0 250

for node in nodes: alive2+=node.thh.isAlive()# print >>stderr,’%d con threads alive’%alive2if alive2==0: breaksleep(1)

for node in nodes:print ’node%d.reached’ %node.id, node.reached

sleep(1)

for node in nodes: node.start()while 1: 260

alive=0for node in nodes:

alive+=node.t.isAlive()# print >>stderr,’%d path threads alive’%aliveif alive==0: breaksleep(1)

print ’’

print ’The distance matrix’

for node in nodes:print node.id,str(node.dist) 270

for node in nodes:print ’The shortest paht from’ ,node.id,’to all the others is’ , str(node.d)

print ’’

67 Program listings

print ’The table of first neighbor corresponding to the shortes path’

for node in nodes:print node.first

sleep(1)

for node in nodes: node.start1()while 1: 280

alive1=0for node in nodes:

alive1+=node.th.isAlive()# print >>stderr,’%d center threads alive’%alive1if alive1==0: break

for node in nodes:print node.id,’is’ , str(node.status)

dot(nodes)

Appendix B

Proofs of some theoremsmentioned in this report

Proof B.0.1 To avoid a case analysis which are send, receive, or internalevents, we represent each event by the uniform notation (c, x, y, d). Herec and d denote the process state before and after the event, x is the collec-tion of messages received in the event, and y is the collection of messagessent in the event. Thus, an internal event (c, d) is denoted (c, O,O, d); asend event (c, m, d) is denoted (c, O,m, d); and a receive event (c, m, d) isdenoted (c, m, Ø, d). In this notation, event e = (c, x, y, d) of process p isapplicable in configuration γ = (cp1 , ..., cpi

, ..., cpN, M) if cp = c and x ⊆ M .

In this casee(γ) = (cp1 , ..., d, ..., (M x) ∪ y).

Now assume ep = (bp, xp, yp, dp) and eq = (bq, xq, yq) are applicable in

γ = (..., cp, ..., cq, ...,M),

that is, cp = bp, cq = bq,xp ⊆ M . An important observation is that xp and xq

are disjoint; the message in xp (if any) has destination p, while the messagein xq (if any) has destination q.

Write γP = ep(γ), and note that

γp = (..., dp, ..., cq, ..., (M \ xp) ∪ yp).

69 Proofs of some theorems mentioned in this report

As xq ⊆ M and xq ∩ xp = Ø, it follows that xq ⊆ (M \ xp ∪ yp), and henceeq is applicable in γp. Write γpq = eq(γp), and note that

γpq = (..., dp, ..., dq, ..., (M \ xp ∪ xq) \ xq) ∪ yq).

By a asymmetric argument it can be shown that ep is applicable in γq =

eq(γ).

Write γpq = ep(γq), and note that

γpq = (..., dp, ...dq, ..., ((M \ xq ∪ yq) \ xp ∪ yp)

and hence γpq = γqp.

Let ep and eq be two events that occur consecutively in an execution,i.e., the execution contains the subsequence

..., γ, ep(γ), eq(ep(γ)), ...

for some γ. The premise of Theorem applies to these events except in thefollowing two cases.

(1) p = q; or

(2) ep is a send event, and eq is the corresponding receive event.

Proof B.0.2 Let p0 be the smallest initiator. The wave initiated by p0 is joinedimmediately by every process that receives a message of this wave, and every pro-cess completes this wave because there is no wave with smaller identity for whichthe process would abort the execution of the wave of p0. Consequently, the waveof p0 runs to completion, a decision will take place and p0 becomes leader.

If p is a non-initiator, no wave with identity p is ever initiated, hence p

does not become leader. If p 6= p0 is an initiator, a wave with identity p willbe started but a decision in this wave is preceded by a send event (for thiswave by p0) 3.2. As p0 never executes a send or internal event of the wavewith identity p, such a decision does not take place, and p is not elected.

At most n waves are started, and each wave uses at most m messages,which brings the overall complexity to nm.

70 Proofs of some theorems mentioned in this report

It is a more delicate question to estimate the time complexity of Ex(A).In many cases it will be of the same order of magnitude as the time com-plexity of Ont can be shown (where t is the time complexity of the wavealgorithm), because within t time units after initiator p starts its wave, p’swave decides or another wave is started.

Proof B.0.3 As the number of nodes of T exceeds the number of edges byone it suffices to show that T contains no cycle. This follows because, witheq the first event in q, qr ∈ ET implies er � eq, and � is a partial order.

Proof B.0.4 As C is a wave there exists an f ∈ Cq that causally precedesdp; choose f to be the last event of Cq that precedes dp. To show that f is asend event, observe that the definition of causality Definition (

Definition B.1 : 2.20) implies that there exists a sequence (causality chain)

f = e0, e1, ..., ek = dp

such that for each i < k, ei and ei+1 are either consecutive events in the same pro-cess or a corresponding sent-receive pair. As f is the last event in q that precedesdp, e1 occurs in a process different from q, hence f is a send event.

Bibliography

[AF00] A.Wagner and D. A. Fell. The small world inside largemetabolic networks. Technical Report 00-07-041 Santa Fe Insti-tute, 2000.

[AHT00] Stephen Alstrup, Jacob Holm, and Mikkel Thorup. Maintain-ing center and median in dynamic trees. In Scandinavian Work-shop on Algorithm Theory, pages 46–56, 2000.

[Ahu90] M. Ahuja. Flush primitives for asynchronous distributed sys-tems. Inf. Proc. Lett, 34:5–12, 1990.

[ALA99] A.-L.Barabasi and R. Albert. Emergence of scaling in randomnetworks. Science, 286:509–512, 1999.

[A.S83] A.Segall. Distributed network protocols. IEEE Trans. Inform.,Theory IT-29:23–35, 1983.

[ASBS02] L. A. M. Amaral, A. Scala, M. Barthelemy, and H. E. Stanley.Classes of small world networks. Proc. Natl. Acad. Sci USA,97:11149–11152, 2002.

[Bar96a] Valmir C. Barbosa. An introduction to distributed algorithm.1996.

[Bar96b] Valmir C. Barbosa. An introduction to distributed algorithms.1996.

[BG92] Dimitri Bertsekas and Robert Gallager. Data networks (secondedition). 1992.

[Bol85] B. Bollobas. Random graphs. Academic Press, London, UK, 1985.

72 BIBLIOGRAPHY

[BS03] Stefan Bornholdt and Heinz Georg Schuster. Handbook ofgraphs and networks, from the genome to the internet. 2003.

[BT96] Dimitri P. Bertsekas and John N. Tsitsiklis. Parallel and dis-tributed computation. 1996.

[CBMT92] B. Charron-Bost, F. Mattern, and Garard Tel. Synchronousand asychronous communication in distributed computations.Tech. Rep. LTTP, 92.77, 1992.

[DIM] S. Dolev, A. Israeli, and S. Moran. Uniform self-stabilizingleader election part 1: Complete graph protocols. S. Dolev,A. Israeli and S. Moran, Uniform Self-Stabilizing Leader Elec-tion Part 1: Complete Graph Protocols, Technical Report 807,Computer Science Dept., Technion.

[D.Jds] Watts D.J. 1999. Princeton University Press, Princeton, NJ, SmallWorlds.

[ea] Shlomi Dolev et al. Memory requirements for silent stabiliza-tion (extended abstract).

[ER60] P. Erdos and A. Renyi. On the evolution of random graphs.Publ. Math. Inst. Hung. Acad. Sci., Ser. A, 5:17–61, 1960.

[GB81] Eli M. Gafni and Dimitri P. Bertsekas. Distributed algorithmsfor generating loop-free routes in networks with frequentlychanging topology. IEEE transactions on communications, COM-29, NO.1:509–512, 1981.

[GB99] Eli M. Gafni and Dimitri P. Bertsekas. Ad hoc on-demand dis-tance vector (aodv) routing draft-ietf-manet-aodv-04.txt. Mo-bile Ad Hoc Network Group, COM-29, No.1:509–512, 1999.

[Ger91] Gerard.Tel. Fouttolerantie in gedistribueerde algorithm. Lec-ture note INF/DOC-91-02, Department Computer science, UtrechtUniversity, The Netherlands, 1991., 1991.

[G.L77] G.Lelann. Distributed systems: Towards a formal approach. InProc. Information Processing ’77, (1977), B. Gilchrist (ed.), North-Holland, pages 155–160, 1977.

73 BIBLIOGRAPHY

[JM96] David B Johnson and David A Maltz. Dynamic source routingin ad hoc wireless networks. In Imielinski and Korth, editors,Mobile Computing, volume 353. Kluwer Academic Publishers,1996.

[JSbs] J.M.Montoya and R. V. Sole. 2002. J. Theor. Biol..in press, SantaFe Institute Preprint, pages 01–10–59, Small world patterns infood webs.

[JTA+00] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabasi.The large-scale organization of metabolic networks. Nature,407:651–654, 2000.

[KPBG94] Mehmet Hakan Karaata, Sriram V. Pemmaraju, Steven C. Bru-ell, and Sukumar Ghosh. Self-stabilizing algorithms for find-ing centers and medians of trees. In Symposium on Principles ofDistributed Computing, page 374, 1994.

[Lam78] L. Lamport. Time, clocks and the ordering of events in a dis-tributed system. Communication, ACM21:558–564, 1978.

[Lyn97] Nancy Ann Lynch. Distributed algorithms, San Francisco,Calif.: Morgan Kaufmann. 1997.

[MPT98] Yoram Moses, Zvi Polunsky, and Ayellet Tal. Algorithm vi-sualization for distributed environments. In Proceedings IEEESymposium on Information Visualization 1998, pages 71–78, 1998.

[MWV00] N. Malpani, J. Welch, and N. Vaidya. Leader election algo-rithms for mobile ad hoc networks, 2000.

[NOa] K. Nakano and S. Olariu. Randomized leader election proto-cols for ad-hoc networks. pages 253–268.

[NOb] Koji Nakano and Stephan Olariu. Uniform leader election pro-tocols for radio networks.

[NO00] Koji Nakano and Stephan Olariu. Randomized leader electionprotocols in radio networks with no collision detection. InInternational Symposium on Algorithms and Computation, pages362–373, 2000.

74 BIBLIOGRAPHY

[PC97] Vincent D. Park and M. Scott Corson. A highly adaptive dis-tributed routing algorithm for mobile wireless networks. InINFOCOM (3), pages 1405–1413, 1997.

[Per97] C. Perkins. Ad-hoc on-demand distance vector routing, 1997.

[RS99] A. Roosta and B. Seyed. Parallel processing and parallel algo-rithms: theory and computation. 1999.

[Sch] M. Schneider. Flow routing in computer networks.

[SES89] A. Schiper, J. Eggli, and A. Sandoz. A new algorithm to imple-ment causal ordering. In Proc. 3rd Int. Workshop on distributed al-gorithms (Nice, 1989), J.-C. Bermond and M. aynal (eds.), 392:219–232, 1989.

[SR] Chris Savarse and Jan M. Rabaey. Locationaing in distributedad-hoc wireless sensor networks.

[Taj77] W. D. Tajibnapis. A correctness proof of a topology informa-tion maintenance protocol for a distributed computer network.Commun. of ACM, 20(7):477–485, 1977.

[Tan88] A. S. Tanenbaum. Computer networks. page 658, 1988.

[Tel94] Gerard Tel. Introduction to distributed algorithms. 1994.

[WS03] Stefan Wuchty and Peter F. Stadler. Centers of complex net-works. Journal of Theoretical Biology, 223:45–53, 2003.

a distributed algorithm for the graph center...

Documents