decentralising a service-oriented architecture › download › pdf › 81887123.pdf ·...

Peer-to-Peer Netw. Appl. (2010) 3:323–350DOI 10.1007/s12083-009-0062-6

Decentralising a service-oriented architecture

Jan Sacha · Bartosz Biskupski · Dominik Dahlem ·Raymond Cunningham · René Meier · Jim Dowling ·Mads Haahr

Received: 7 January 2009 / Accepted: 22 September 2009 / Published online: 8 October 2009© The Author(s) 2009. This article is published with open access at Springerlink.com

Abstract Service-oriented computing is becoming anincreasingly popular paradigm for modelling and build-ing distributed systems in open and heterogeneousenvironments. However, proposed service-oriented ar-chitectures are typically based on centralised compo-nents, such as service registries or service brokers, thatintroduce reliability, management, and performanceissues. This paper describes an approach to fully de-centralise a service-oriented architecture using a self-organising peer-to-peer network maintained by serviceproviders and consumers. The design is based on a gra-dient peer-to-peer topology, which allows the system toreplicate a service registry using a limited number of themost stable and best performing peers. The paper eval-uates the proposed approach through extensive sim-ulation experiments and shows that the decentralisedregistry and the underlying peer-to-peer infrastructurescale to a large number of peers and can successfullymanage high peer churn rates.

Keywords Gradient topology ·Service-oriented architecture · Super-peer election ·Utility · Aggregation

J. Sacha (B)Vrije Universiteit, Amsterdam,The Netherlandse-mail: [email protected]

B. Biskupski · D. Dahlem · R. Cunningham ·R. Meier · M. HaahrTrinity College, Dublin, Ireland

J. DowlingSwedish Institute of Computer Science, Kista, Sweden

1 Introduction

Service-Oriented Computing (SOC) is a paradigmwhere software applications are modelled as collectionsof loosely-coupled, interacting services that communi-cate using standardised interfaces, data formats, andaccess protocols. The main advantage of SOC is that itenables interoperability between different software ap-plications running on a variety of platforms and frame-works, potentially across administrative boundaries[1, 2]. Moreover, SOC facilitates software reuse andautomatic composition and fosters rapid, low-cost de-velopment of distributed applications in decentralisedand heterogeneous environments.

A Service Oriented Architecture (SOA) usually con-sists of three elements: service providers that publishand maintain services, service consumers that use ser-vices, and a service registry that allows service discov-ery by prospective consumers [1, 3]. In many proposedSOAs, the service registry is a centralised component,known to both publishers and consumers, and is oftenbased on the Universal Description Discovery and In-tegration (UDDI) protocol.1 Moreover, many existingSOAs rely on other centralised facilities that provide,for example, support for business transactions, serviceratings or service certification [3].

However, each centralised component in a SOA con-stitutes a single point of failure that introduces securityand reliability risks, and may limit a system’s scalabilityand performance.

This paper describes an approach to decentralisea service-oriented architecture using a self-organising

1http://uddi.xml.org/

http://uddi.xml.org/

324 Peer-to-Peer Netw. Appl. (2010) 3:323–350

Peer-to-Peer (P2P) infrastructure maintained by ser-vice providers and consumers. A P2P infrastructure isan application-level overlay network, built on top ofthe Internet, where nodes share resources and provideservices to each other. The main advantages of P2Psystems are their very high robustness and scalability,due to inherent decentralisation and redundancy, andthe ability to utilise large amounts of resources avail-able on machines connected to the Internet. While theservice provision and consumption in a SOA are inher-ently decentralised, as they are usually based on directinteractions between service providers and consumers,a P2P infrastructure enables the distribution of a serviceregistry, and potentially other SOA facilities, acrosssites available in the system.

However, the construction of P2P applications posesa number of challenges. Measurements on deployedP2P systems show that the distributions of peer charac-teristics, such as peer session time, available bandwidthor storage space, are highly skewed, and often heavy-tailed or scale-free [4–7]. A relatively small fractionof peers possess a significant share of the system re-sources, and a large fraction of peers suffer from pooravailability or poor performance. The usage of theselow stability or low performance peers for providingsystem services (e.g., routing messages on behalf ofother peers) can lead to a poor performance of theentire system [8]. Furthermore, many distributed al-gorithms, such as decentralised keyword search [9],become very expensive as the system grows in size dueto the required communication overhead.

As a consequence, P2P system designers attempt tofind a trade-off between the robustness of fully decen-tralised P2P systems and the performance advantageand manageability of partially centralised systems. Acommon approach is to introduce a two-level hierarchyof peers, where so called super-peers maintain systemdata and provide core functionality, while ordinarypeers act as clients to the super-peers.

In this paper, a service registry is distributed betweenservice providers and consumers in the system using agradient-based peer-to-peer topology. The applicationof a gradient topology allows the system to place theSOA registry on a limited number of the most reliableand best performing peers in order to improve both thestability of the service registry and the cost of searchingusing this registry. Furthermore, the gradient topologyallows peers to update and optimise the set of registryreplicas as the system and its environment evolve, andto limit the number of replica migrations in order toreduce the associated overhead. Analogously, the gra-dient topology can be used to decentralise additional

facilities in a SOA, such as a transaction service or acertificate repository.

The proposed approach has been evaluated in anumber of usage scenarios through extensive simula-tion experiments. Obtained results show that the de-centralised registry, and the underlying algorithms thatmaintain the gradient topology, are scalable and re-silient to high peer churn rates and random failures.

The remainder of this paper is organised as follows.Section 2 describes the design of a gradient P2P topol-ogy and shows how this topology is used to supporta decentralised service registry. Section 3 contains anextensive evaluation of a decentralised service registrybuilt top of a gradient topology. Section 4 describes thereview of related work, and Section 5 concludes thepaper.

2 Gradient topology

The gradient topology is a P2P overlay topology, wherethe position of each peer in the system is determined bythe peer’s utility. The highest utility peers are clusteredin the centre of the topology (the so called core) whilepeers with lower utility are found at gradually increas-ing distance from the centre. Peer utility is a metric thatreflects the ability of a peer to contribute resources andmaintain system infrastructural services, such as a SOAregistry.

The gradient topology has two fundamental proper-ties. Firstly, all peers in the system with utility abovea given value are connected and form a gradient sub-topology. Such high utility peers can be exploited by thesystem for providing services to other peers or hostingsystem data. Secondly, the structure of the topologyenables an efficient search algorithm, called gradientsearch, that routes messages from low utility peerstowards high utility peers and allows peers to discoverservices or data in the system. These two propertiescontribute to an efficient implementation of a decen-tralised SOA registry.

The SOA registry is distributed between a numberof peers in the system for reliability and performancereasons. Hence, there are two types of peers: super-peers that host registry replicas, and ordinary peersthat do not maintain any replicas. A utility thresh-old is defined as a criteria for the registry replicaplacement, i.e., all peers with utility above a selectedthreshold host replicas of the registry. Finally, gra-dient search is used by ordinary peers to discoverhigh utility peers that maintain the registry. Figure 1shows a sample P2P gradient topology, where the

Peer-to-Peer Netw. Appl. (2010) 3:323–350 325

GradientSearch

Service Registry Replicas Ordinary Peers

GradientSearch

A

B

C

Replica Placement Threshold

Fig. 1 Registry replication and discovery in the gradient topol-ogy. Peers A, B, and C access registry replicas, hosted by peers inthe core, using gradient search

service registry is located at the core peers determinedby the replica placement threshold.

The following subsections describe in more detail themain components of the gradient topology: utility met-rics that capture individual peer capabilities; a neigh-bour selection algorithm that generates the gradienttopology; a super-peer election algorithm for registryreplica placement; an aggregation algorithm, requiredby the super-peer election, that approximates globalsystem properties; a gradient search heuristic that en-ables the discovery of registry replicas; and finally, theregistry replica synchronisation algorithms.

2.1 Characterising peers

In order to determine peers with the most desiredcharacteristics for the maintenance of a decentralisedservice, such as the SOA registry, a metric is definedthat describes the utility of peers in the system. Peerutility, denoted U(p) for peer p, is a function of localpeer properties, such as the processing performance,storage space, bandwidth, and availability. Most ofthese parameters can be measured, or obtained fromthe operating system, by each peer in a straight-forwardway. In the case of dynamically changing parameters, apeer can calculate a running average.

Network characteristics, such as bandwidth, latency,and firewall status, are more challenging to estimatedue to the decentralised and complex nature of wide-area networks. Moreover, many network properties,including bandwidth and latency, are properties of pairsof peers, i.e., connections between two peers, ratherthan individual peers. Nevertheless, a peer can estimatethe average latency and bandwidth of all its connectionsover time and use the average value as a general in-dication of its network connectivity and overall utilityfor the system. Furthermore, it has been shown that thebottleneck bandwidth of a connection between a peerand another machine on the Internet is often deter-mined by the upstream bandwidth of the peer’s directlink to the Internet [10]. Thus, available bandwidth canbe treated as a property of single peers.

Peer stability is amongst the most important peercharacteristics, since in typical P2P systems the sessiontimes vary by orders of magnitude between peers, andonly a relatively small fraction of peers stay in thesystem for a long time [8]. One way of measuring thepeer stability is to estimate the expected peer sessionduration using the history of previous peer sessiontimes. Stutzbach et al. [7] show that subsequent sess-ion times of a peer are highly correlated and the dura-tion of a previous peer session is a good estimate for thefollowing session duration. However, the informationabout previous peer session durations may not alwaysbe available, for example for new peers that are joiningthe system for the first time. Another approach is toestimate the remaining peer session time using the cur-rent peer uptime. Stutzbach et al. [7] show that currentuptime is on average a good indicator of remaining up-time, although it exhibits high variance. For example, insystems where the peer session times follow the power-law (or Pareto) distribution, the expected remainingsession time of a peer is proportional to the currentpeer uptime. Similar properties can be derived for othersession time distributions, such as the Weibull or log-normal distributions, used in P2P system modelling.

Formally, if the peer session times in a system followthe Pareto distribution, the probability that a peer ses-sion duration, X, is greater than some value x is givenby P(X > x) = (m

x )k, where m is the minimum sessionduration and k is a system constant such that k > 1.The expected peer session duration is E(X) = μ = k·m

k−1 .However, if a peer’s current uptime is u, where u > m,the expected session duration follows the Pareto distri-bution with the minimum value of u, i.e., P(X > x) =( u

x )k, and hence, the expected session duration time isk·uk−1 . From this we can derive the expected remaininguptime as k·u

k−1 − u = uk−1 .


2.2 Utility metric properties

The choice of the utility metric has a strong impacton the gradient topology. A utility metric where peersfrequently changes their utility values puts more stresson the neighbour selection algorithm and may desta-bilise the topology. It may also cause frequent switchesbetween super-peers and ordinary peers, which may beexpensive and undesired.

However, if peer utility grows or decreases monoton-ically, peers can cross the super-peer election thresholdonly once, assuming a constant threshold. Additionally,if the utility changes are predictable, each peer is ableto accurately estimate its own utility and the utility ofits neighbours at any given time.

For example, if peer p defines its utility as the ex-pected session duration, Ses(p), and estimates it basedon the history of its previous sessions, utility U(p) isconstant during each peer session. When p is electeda super-peer, it is not demoted to a client unless thesuper-peer election threshold increases above U(p).

If the utility of p is defined as p’s current uptime,denoted Up(p), peer utility increases monotonicallywith time. Again, when p is elected a super-peer, it isnot demoted unless the election threshold rises aboveUp(p). More importantly, the utility function is fullypredictable. Any peer q, at any time t, can compute theutility of p, given q has a knowledge of p’s birth time,i.e., the time tp when peer p entered the system. Peerutility is simply equal to

U(p) = t − tp. (1)

Clocks do not need to be synchronised between peers,and q can estimate the birth time of p using its ownclock. At time t, when q receives the current uptimeUp(p) from p, it assumes that tp = t − Up(p).

For capacity metrics, such as the storage, band-width, or processing capacity, there are two generalapproaches to define peer utility. One approach is tocalculate peer utility based on the currently availablepeer capacity. However, this has the drawback that peerutility may change over time, and these changes maybe unpredictable to other peers. A better approach isto define peer utility based on the total peer capacity,which is usually static. Such utility functions are ad-dressed later in the super-peer election Section 2.4.

Finally, certain algorithms described in this articleassume that peer utility values are unique, i.e., U(p) �=U(q) for any peers p �= q. This property may not holdfor some utility definition, particularly if peer utilityis based on hardware parameters such as CPU clockspeed and amount of RAM. If the utility function is

significantly coarse-grained, the construction of a gra-dient topology may become impossible. In order toaddress this problem, each peer can add a relativelysmall random number to its utility value to break thesymmetry with other peers.

Table 1 summarises the utility metric properties.

2.3 Generating a gradient peer-to-peer topology

In P2P systems, each peer is connected to a limitednumber of neighbours and the system topology is deter-mined by the neighbourhood relation between peers.

There are two general approaches to modellingand implementing the neighbourhood relation betweenpeers. In the first approach, a peer stores addressesof its neighbours, which allows the peer to send mes-sages directly to each neighbour, and the neighbour-hood relation is asymmetric. This strategy is relativelystraightforward to implement, but it has the drawbackthat peers may store stale addresses of peers that haveleft the system. This is especially likely in the presenceof heavy churn in the system. Moreover, such danglingreferences can be disseminated between peers unless anadditional mechanism is imposed that eliminates themfrom the system, such as timestamps [11, 12].

In the second approach, the neighbourhood relationbetween peers is symmetric. This can be simply im-plemented by maintaining a direct, duplex connection(e.g., TCP) between each pair of neighbouring peers.If a peer is not able to maintain connections with allits neighbours, for example due to the operating systemlimits, neighbouring peers store the addresses of eachother. This has the advantage that peers can notify eachother when changing their neighbourhoods or leavingthe system, which helps to keep the neighbourhoodsets up to date. Furthermore, outdated neighbour en-tries are not propagated between peers in the system,as each peer verifies a reference received from otherpeers by establishing a direct connection with eachnew neighbour. In the case of neighbours crashing,or leaving without notice, broken connections can bedetected either by the operating system (e.g., usingTCP keep alive protocol) or through periodic pollingof neighbours at the application level. In the remaining

Table 1 Utility metric properties

Utility metric Constant Monotonic Predictable

Total capacity Yes Constant YesAvailable capacity No No NoSession length Yes Constant YesUptime No Increasing Yes


part of this paper, it is assumed that the neighbourhoodrelation between peers is symmetric.

The gradient topology is generated by a periodicneighbour selection algorithm executed at every peer.Periodic neighbour selection algorithms generally per-form better than reactive algorithms in heavy churnconditions, as they have bounded communication cost.It has been observed that in systems with reactiveneighbour exchange, peers generate bursts of messagesin response to local failures, which congest local con-nections and result in a chain-reaction of other peerssending more messages, which may lead to a majorsystem failure [8].

The structure of the algorithm, shown in Fig. 2, issimilar to the T-Man framework [13], however, dueto the different neighbourhood models, the two algo-rithms are not directly comparable.

The algorithm relies on a preference function definedfor each peer p over its neighbourhood set Sp, such thatmaxSp is the most preferred neighbour for p and minSp

is the least preferred neighbour for p. Peer p attemptsto connect to a new neighbour when the size of Sp isbelow the desired neighbourhood set size s∗, and a peerdisconnects a neighbour when the size of Sp is above s∗.

New neighbours are obtained through gossippingwith high preference neighbours, maxSp in particular,which is based on the assumption that high preferenceneighbours of peer p are logically close to each otherin the gradient structure. However, greedy selection ofmaxSp for gossipping has the drawback that p is likelyto obtain the same neighbour candidates from maxSp

in subsequent rounds of the algorithm. The algorithmcan potentially achieve better performance if p selects

if jSpj> s then1discon

*

*

nect(min(Sp))2end3else4

n max(Sp)5S0 (Sn nSp)nfpg6if jSpj< s then7

connect(max(S0))8end9else10

if max(S0)> min(Sp) then11disconnect(min(Sp))12connect(max(S0))13

end14

end15

end16

Fig. 2 Neighbour selection at peer p

neighbours for gossipping probabilistically with a biastowards higher preference peers.

In the gradient topology, a peer p maintains twoindependent neighbourhood sets: a similarity set Sp

and a random set Rp. The similarity set clusters peerswith similar utility characteristics and generates thegradient structure of the topology, while the random setdecreases the peer’s clustering coefficient, significantlyreducing the probability of the network partitioning aswell as decreasing the network diameter. Random setsare also used by the aggregation algorithm describedbelow.

For static and predictable utility metrics, each peeris able to accurately estimate its neighbours’ utility.In case of non-predictable utility metrics, each peer pneeds to maintain a cache that contains the most recentutility value, U p(q), for each neighbour q. Every entryU p(q) in the cache is associated with a timestamp cre-ated by q when the utility of q is calculated. Neighbour-ing peers exchange and merge their caches every timetheir neighbour selection algorithms exchange mes-sages, preserving the most recent entries in the caches.Clocks do not need to be synchronised between peerssince all utility values for a peer q are timestampedby q.

For the random set, the preference function is uni-formly random, i.e., the relationship between any twopeers is determined using a pseudo-random numbergenerator each time two peers are compared. Thetopology generated by such a preference function hassmall-world properties, including very low diameter,extremely low probability of partitioning, and higherclustering coefficient compared to random graphs. Sim-ilar topologies can be generated by other randomisedgossip-based neighbour exchange algorithms, such asthose described in [11, 12].

For the similarity-based set, the preference functionis based on the utility metric U. Peers aim at selectingneighbours with similar but slightly higher utility. For-mally, peer p prefers neighbour a over neighbour b , i.e.,a > b , if and only if

U p(a) > U(p) and U p(b) < U(p) (2)

or∣∣U p(a) − U(p)

∣∣ <

∣∣U p(b) − U(p)

∣∣ (3)

for U p(a), U p(b) > U(p) and U p(a), U p(b) < U(p).Moreover, peer p selects potential entries to Sp fromboth Sq and Rq of a neighbour q.

A simpler strategy, where peers prefer neigh-bours with the closest possible utility, i.e., a > b if∣∣U p(a) − U(p)

∣∣ <

∣∣U p(b) − U(p)

∣∣, does not work well


in systems with skewed utility distributions, as it mayproduce disconnected topologies consisting of clustersof similar utility peers. For example, in systems withheavy-tailed utility distributions, peers do not connectto the few highest utility peers, as they have closerlower-utility neighbours. This problem is alleviated ifpeers connect to similar, but preferably higher utility,neighbours.

The random set, Rp, never reaches a stable state, aspeers constantly add and remove random neighbours.This is desired, since random connections provide ameans for the exploration of the system. However,for the similarity sets, Sp, instability or thrashing ofconnections are harmful as reconfiguring of neighbourconnections increases system overhead. Such connec-tion thrashing may occur when p selects q as the bestavailable neighbour, while q consistently disconnectsp as a non-desired neighbour. In order to avoid suchcases, each peer distinguishes between connections ini-tialised by itself and connections initialised by otherpeers. In the absence of failure, a peer closes only thoseconnections that it has initialised. By doing so, peersagree on which connections can be closed, improvingtopology stability.

The performance of the algorithm can be furtherimproved by introducing “age bias” [14]. With this tech-nique, a peer p does not initiate gossip exchange withlow-uptime neighbours, because such neighbours havenot had enough time to optimise their neighbourhoodsets according to the preference function, and thereforeare not likely to provide good neighbours for p.

The described neighbour selection algorithm contin-uously strives to cluster peers with similar utility. How-ever, due to the system scale and dynamism, only thehighest utility peers, with sufficiently long life span andhigh amount of resources, are able to discover globallysimilar neighbours, while lower utility peers, due totheir instability, have mostly random neighbours. As aconsequence, a stable core of the highest utility peersemerges in the system, where the connections betweenpeers are stable, and the core is surrounded by a swarmof lower utility peers, where the topology structureis more dynamic and ad-hoc. As shown later in theevaluation section, the neighbour selection algorithmgenerates a gradient topology in a number of differentP2P system configurations.

2.4 Electing super-peers

The super-peer election algorithm, executed locally byeach peer in the system, classifies each peer as eithera super-peer hosting a registry replica or an ordinarypeer that hosts no replicas. The algorithm has the

property that it elects super-peers with globally high-est utility, and it maintains this highest utility set asthe system evolves. Furthermore, the algorithm limitsthe frequency of switches between ordinary peers andsuper-peers in order to reduce the associated overhead.

The election algorithm is based on adaptive utilitythresholds. Peers periodically calculate a super-peerelection threshold, compare it with their own utility,and become super-peers if their utility is above thethreshold. Eventually, all peers with utility above thecurrent threshold become super-peers.

The top-K threshold is defined as a utility value,tK, such that the K highest utility peers in the systemhave their utility above or equal to tK, while all otherpeers have utilities below tK. Given the cumulative peerutility distribution in the system, D, where

D(u) =∣∣∣

{

p | U(p) ≥ u}∣∣∣ (4)

the top-K threshold is described by the equation

D(tK) = K. (5)

In large-scale dynamic P2P systems, the utility dis-tribution function is not known a priori by peers, as itis a dynamic system property, however, peers can usedecentralised aggregation techniques, described in thenext section, to continuously approximate the utilitydistribution by generating utility histograms. The cumu-lative utility histogram, H, consisting of B bins of widthλ can be represented as a B-dimensional vector suchthat

H(i) =∣∣∣

{

p | U(p) ≥ i · λ}∣∣∣ (6)

for i ∈ {1, ..., B}. The histogram is a discrete approxi-mation of the utility distribution function in B pointsin the sense that H(i) = D(i · λ) for i ∈ {1, ..., B}. Thetop-K threshold can be then estimated using a utilityhistogram with the following formula

tK = D−1(K) ≈ λ · arg max1≤i≤B

(

H(i) ≥ K)

(7)

where the accuracy of the threshold approximationincreases with the number of bins in the histogram.The approximation accuracy can be further improvedif bin widths in the histogram are non-uniform and areadjusted in such a way that bins closest to the thresholdare narrow while bins farther from the threshold aregradually wider.

A top-K threshold allows a precise restriction onthe number of super-peers in a dynamic system, wherepeers are continuously joining and leaving, since it hasthe property that exactly K peers in the system areabove this threshold. Similarly, a proportional threshold


is defined as a utility value, tQ, such that a fixed fractionQ of peers in the system have utility values greater thanor equal to tQ and all other peers have utility lower thantQ. In a system with N peers, this is described by thefollowing equation

D(tQ) = Q · N. (8)

The proportional threshold can be approximated usinga utility histogram, since

tQ = D−1(Q · N) ≈ λ · arg max1≤i≤B

(

H(i) ≥ Q · N)

. (9)

where the utility histogram, H, and the number ofpeers in the system, N, are again determined using theaggregation algorithm.

As the system grows or shrinks in size, the pro-portional threshold increases or decreases the numberof super-peers in the system and the ratio of super-peers to ordinary peers remains constant. As such itis more adaptive than the top-K threshold algorithm.However, setting an appropriate number K, or ratioQ, of super-peers in the system using the top-K thresh-old or proportional thresholds requires domain-specificor application-specific knowledge about system behav-iour. A self-managing approach is preferable where thesize of the super-peer set adapts to the current demandor load in the system.

It can be assumed that each peer p has some totalcapacity C(p), which determines the maximum num-ber of client requests that this peer can handle at atime if elected super-peer, and each peer has a cur-rent load, L(p), which represents the number of clientrequests currently being processed by peer p, whereL(p) < C(p). One approach is to define peer utilityas a function of the peer’s available capacity (i.e.,C(p) − L(p)) and to elect super-peers with maximumavailable capacity. However, this has the drawbackthat the utility of super-peers decreases as they receiverequests, and increases as they fall below the super-peer election threshold and stop serving requests, whichmay generate fluctuations of high utility peers in thecore. Depending on the application, frequent switchesbetween ordinary peers and super-peers may introducesignificant overhead, and may destabilise the overlay.

A better approach is to define the peer utility as afunction of the total peer capacity, C(p), and to adjustthe super-peer election threshold based on the loadin the system. This way, peer utility, and hence thesystem topology, remains stable, while the super-peerset grows and shrinks as the total system load increasesand decreases.

The utilisation of peer p is the ratio of peer’s currentload to the peer’s maximum capacity, L(p)

C(p). For a set

SP of super-peers in the system, the average super-peerutilisation is given by∑

p∈SP L(p)∑

p∈SP C(p). (10)

In order to maintain the average super-peer utilisationat a fixed level, W, where 0 ≤ W ≤ 1, and to adapt thenumber of super-peers to the current load, the adaptivethreshold tW is defined such that

∑

p L(p)∑

U(p)>tWC(p)

= W. (11)

Peers can estimate the adaptive threshold by approx-imating the average peer load in the system, L, thetotal number of peers in the system, N, and the capacityhistogram, Hc, defined as

Hc(i) =∑

U(p)≥i·λC(p) (12)

where i ∈ {1, ..., B}. The total system load is given thenby N · L, and the adaptive threshold can be estimatedusing the following formula

tW ≈ λ · arg max1≤i≤B

(

Hc(i) ≥ N · LW

)

. (13)

In a dynamic system, the super-peer election thresh-old constantly changes over time due to peer arrivalsand departures, utility changes of individual peers, sta-tistical error in the approximation of system proper-ties, and system load variability. Hence, peers needto periodically recompute the threshold and their ownutility in order to update the super-peer set. However,frequent switches between super-peers and ordinarypeers increase the system overhead, for example dueto data migration and synchronisation between super-peers. In order to avoid peers frequently switching rolesbetween super-peer and ordinary peer, the system usestwo separate thresholds for the super-peer election, anupper threshold, tu, and a lower threshold, tl, wheretu > tl (see Fig. 3). An ordinary peer becomes a super-peer when its utility rises above tu, while a peer stops tobe super-peer when its utility drops below tl. This way,the system exhibits the property of hysteresis, as peersbetween the higher and lower utility thresholds do notswitch their status, and the minimum utility changerequired for a peer to switch its status is � = tu − tl.Figure 4 shows the skeleton of the super-peer electionalgorithm.

2.5 Estimating system properties

The aggregation algorithm, described in this section,allows peers to estimate global system properties


UpperThreshold

UtilityGradient

UtilityGradient

OrdinaryPeers Super-Peers

Delta

LowerThreshold

Fig. 3 Super-peer election with two utility thresholds on thegradient topology

required for the calculation of the super-peer electionthresholds. The algorithm approximates the currentnumber of peers in the system, N, the maximum peerutility in the system, Max, the average peer load inthe system, L, a cumulative utility histogram, H, anda cumulative capacity histogram, Hc. Depending onthe super-peer election method, peers may only needa subset of these system properties.

The aggregation algorithm is based on periodic gos-sipping. Each peer p maintains its own estimates of N,Max, L, H, and Hc, denoted Np, Maxp, Lp, Hp, andHc

p, respectively, and stores a set, Tp, that contains thecurrently executing aggregation instances.

Each peer runs an active and a passive thread, wherethe active thread initiates one gossip exchange pertime step and the passive thread responds to all gossiprequests received from neighbours. On average, a peer

while true do1if super-peer then2

threshold calculateLowerT hreshold()3if U(p) < threshold then4

becomeOrdinaryPeer()5end6

end7else8

threshold calculateU pperT hreshold()9if U(p) > threshold then10

becomeSuperPeer()11end12

end13

end14

Fig. 4 Super-peer election algorithm at peer p

sends and receives two aggregation messages per timestep. When initiating a gossip exchange at each timestep, peer p selects a random neighbour, q, and sendsTp to q. Peer q responds immediately by sending Tq top. Upon receiving their sets, both peers merge them us-ing an update() operation described later. The generalstructure of the algorithm is based on Jelasity’s push-pull epidemic aggregation [15].

The aggregation algorithm can be intuitively ex-plained using the concept of aggregation instances. Anaggregation instance is a computation that generates anew approximation of N, Max, L, H, and Hc for allpeers in the system. Aggregation instances may overlapin time and each instance is associated with a uniqueidentifier id. Potentially any peer can start a new ag-gregation instance by generating a new id and creatinga new entry in Tp. As the new entry is propagatedthroughout the system, other peers join the instanceby creating corresponding entries with the same id.Thus, each entry stored by a peer corresponds to oneaggregation instance that this peer is participating in.Eventually, the instance is propagated to all peers inthe system. Every instance also has a finite time-to-live, and when an instance ends, all participating peersremove the corresponding entries and generate newapproximations of N, Max, L, H, and Hc.

Formally, each entry, Tp, in Tp of peer p is a tupleconsisting of eight values,

(id, ttl, w, m, l, λ, h, hc) (14)

where id is the unique aggregation instance identifier,ttl is the time-to-live for the instance, w is the weightof the tuple (used to estimate N), m is the currentestimation of Max, l is the current estimation of L, λ isthe histogram width used in this aggregation instance,while h and hc are two B-dimensional vectors used inthe estimation of H and Hc, respectively.

At each time step, each peer p starts a new aggre-gation instance with probability Ps by creating a localtuple(

id, TT L, 1, U(p), L(p), λ, Ip, Icp

)

(15)

where id is chosen randomly, TT L is a system constant,Ip is a utility histogram containing one peer p

Ip(i) ={

0 i f U(p) < i · λ

1 i f U(p) ≥ i · λ(16)

and Icp is a capacity histogram initialised by p

Icp(i) =

{

0 i f U(p) < i · λ

C(p) i f U(p) ≥ i · λ. (17)


The bin width λ is set to Maxp

B , where B is the numberof bins in the histograms H and Hc. Probability Ps iscalculated as 1

Np·F , where F is a system constant thatregulates the frequency of peers’ starting aggregationinstances. In a stable state, with a steady number ofpeers in the system, a new aggregation instance is cre-ated on average with frequency 1

F . Furthermore, sincean aggregation instance lasts TT L time steps, a peerparticipates on average in less than TT L

F aggregationinstances, and hence, stores less than TT L

F tuples.As the initial tuple is disseminated by gossipping,

peers join the new aggregation instance. It can beshown that in push-pull epidemic protocols, the dis-semination speed is super-exponential, and with a veryhigh probability, every peer in the system joins anaggregation instance within just several time steps [15].

The tuple merge procedure, update(Tp, Tq), consistsof the following steps. First, for each individual tupleTq = (id, ttlq, wq, mq, lq, λq, hq, hc

q) ∈ Tq received byp from q, if Tp does not contain a local tuple identifiedby id, and ttlq ≥ TT L

2 , peer p creates a local tuple(

id, ttlq, 0, U(p), L(p), λq, Ip, Icp

)

(18)

and adds it to Tp. This way, peer p joins a new aggrega-tion instance id and introduces its own values of U(p),C(p), and L(p) to the computation. However, if ttlq <TT L

2 , peer p should not join the aggregation, as thereis not enough time before the end of the aggregationinstance to disseminate the information about p andto calculate accurate aggregates. This usually happensif p has just joined the P2P overlay and receives anaggregation message that belongs to an already runningaggregation instance. In this case, the update operationis aborted by p.

In the next step, for each tuple Tq = (id, ttlq, wq, mq,

lq, λq, hq, hcq) ∈ Tq, peer p replaces its own tuple Tp =

(id, ttl p, wp, mp, l p, λp, hp, hcp) ∈ Tp with a new tuple

Tn = (id, ttln, wn, mn, ln, λn, hn, hcn) such that

ttln = ttl p + ttlq

2− 1, wn = wp + wq

2, ln = l p + lq

2

(19)

mn = max(mp, mq), λn = λp = λq, and hn and hcn are

new histograms such that

hn(i) = hp(i) + hq(i)2

, hcn(i) = hc

p(i) + hcq(i)

2(20)

for each i ∈ {1, ..., B}. Thus, peer p merges its localtuples with the tuples received from q, contributing tothe aggregate calculation.

Finally, for each tuple Tp = (id, ttl p, wp, mp, l p,

λp, hp, hcp) ∈ Tp, such that ttl p ≤ 0, peer p removes

Tp from Tp and updates the current estimates in thefollowing way: Np = 1

wp, Maxp = mp, Lp = l p, λ = λp,

and for each i ∈ {1, ..., B}

Hp(i) = hp(i)wp

, Hcp(i) = hc

p(i)

wp. (21)

The algorithm has the following invariant. For eachaggregation instance id, the weights of all tuples in thesystem associated with id sum up to 1, with 1

westimating

the number of peers participating in this aggregationinstance.

Peers joining the P2P overlay obtain the currentvalues of N, Max, L, H, and Hc from one of theirinitial neighbours. Peers leaving, if they do not crash,perform a leave procedure that reduces the aggregationerror caused by peer departures, where they send allcurrently stored tuples to a randomly chosen neigh-bour. The receiving neighbour adds the weights of thereceived tuples to its own tuples in order to preserve theweight invariant. Similarly as when joining an aggrega-tion instance, peers do not perform the leave procedurefor tuples with the time-to-live value below TT L

2 , asthere is not enough time left in the aggregation instanceto propagate the weight from these tuples betweenpeers and to obtain accurate aggregation results.

It can be shown, as in [15], that the values Np, Maxp,Lp, Hp, and Hc

p generated by the algorithm at the endof an aggregation instance at each peer p approximatethe true system properties N, Max, L, H, and Hc, withthe average error, or variance, decreasing exponentiallywith TT L.

In order to calculate the super-peer election thresh-olds, peers need to complete two aggregation instances,which requires 2 · TT L time steps. In the first instance,peers estimate the maximum peer utility (Max) anddetermine the histogram bin width (λ = Max

B ). In thefollowing instance, peers generate utility histograms(H or Hc), estimate the system size (N) and load(L), and calculate appropriate thresholds, as defined inSection 2.4.

2.6 Discovering high utility peers

The gradient structure of the topology allows an effi-cient search heuristic, called gradient search, that en-ables the discovery of high utility peers in the system.Gradient search is a multi-hop message passing algo-rithm, that routes messages from potentially any peerin the system to high utility peers in the core, i.e., peerswith utility above the super-peer election threshold.

In gradient search, a peer p greedily forwards eachmessage that it currently holds to its highest utility


neighbour, i.e., to a neighbour q whose utility is equalto

maxx∈Sp∪Rp

(

U p(x))

. (22)

Thus, messages are forwarded along the utility gradi-ent, as in hill climbing and similar techniques.

Local maxima should not occur in an idealised gra-dient topology, however, every P2P system is underconstant churn and the gradient topology may undergolocal perturbations from the idealised structure. In or-der to prevent message looping in the presence of suchlocal maxima, a list of visited peers is appended to eachsearch message, and a constraint is imposed that forbidsmessage forwarding to previously visited peers.

The algorithm exploits the information containedin the topology for routing messages and achieves asignificantly better performance than general-purposesearch techniques for unstructured P2P networks, suchas flooding or random walking, that require the com-munication with a large number of peers in the system[16]. Gradient search also reduces message loss rate bypreferentially forwarding messages to high utility, andhence more stable, peers.

However, greedy message routing to the highestutility neighbours has the drawback that messages arealways forwarded along the same paths, unless thetopology changes, which may lead to a significant im-balance between high utility peers in the core. This isespecially probable in the presence of “heavy hitters”,i.e., peers generating large amounts of traffic, as com-monly seen in P2P systems [4]. Load balancing can beimproved in the gradient topology by randomising therouting, for example, if a peer, p, selects the next-hopdestination, q, for a message with probability, Pp(q),given by the Boltzmann exploration formula [17]

Pp(q) = e(U p(q)/Temp)

∑

i∈Sp∪Rpe(U p(i)/Temp)

(23)

where Temp is a parameter of the algorithm calledthe temperature that determines the “greediness” ofthe algorithm. Setting Temp close to zero causes thealgorithm to be more greedy and deterministic, as ingradient search, while if Temp grows to infinity, allneighbours are selected with equal probability as in ran-dom walking. Thus, the temperature enables a trade-off between exploitative (and deterministic) routing ofmessages towards the core, and random explorationthat spreads the load more equally between peers.The impact of the temperature on the performance ofBoltzmann search has been studied in [16].

2.7 Supporting the decentralised registry service

The registry stores information about services availablein the system. For each registered service, it stores arecord that consists of the service address, text descrip-tion, interface, attributes, etc. The registry allows eachpeer to register a new service, update a service record,delete a record, and search for records that satisfycertain criteria. Each record can be updated or deletedonly by its owner, that is the peer that created it.

For fault-tolerance and performance reasons, theregistry service is replicated between a limited numberof high-utility super-peers. Each peer periodically runsthe aggregation algorithm, calculates the super-peerelection thresholds, and potentially become a super-peer if needed.

It is assumed that the average size of a service recordis relatively small (order of kilobytes), and hence,each super-peer has enough storage space to host afull registry replica, i.e., a copy of all service records.Due to this replication scheme, every super-peer canindependently handle any search query without com-municating with other super-peers. This is important,since complex search, for example based on attributes,keywords, or range queries, is known to be expensivein distributed systems [9, 18]. It is also assumed thatsearch operations are significantly more frequent thanupdate operations, and hence, the registry is optimisedfor handling search.

In order to perform a search on the registry, apeer generates a query and routes it using gradientsearch to the closest super-peer. If the super-peer isheavily-loaded, it may forward the query to anothersuper-peer which has enough capacity to handle it. Thesuper-peer processes the query and returns the searchresults directly to the originating peer. Optionally,clients may cache super-peer addresses and contactsuper-peers directly in order to reduce the routing over-head.

In order to create, delete, or update a record in theregistry, a peer generates an update request and routesit to the closest super-peer using gradient search. Theupdate is then gradually disseminated to all super-peersusing a probabilistic gossip protocol. Every record inthe registry is associated with a timestamp of the mostrecent update operation on this record. The timestampsare issued by the records’ owners. Super-peers peri-odically gossip with each other and synchronise theirregistry replicas, as in [19]. Each super-peer periodi-cally initiates a replica synchronisation with a randomlychosen super-peer neighbour, and exchanges with thisneighbour all updates that it has received since the lasttime the two super-peers gossipped with each other.


Conflicts between concurrent updates are resolvedbased on the update timestamps. Every record canbe updated only by its owner, and it is assumedthat the owner is responsible for assigning consistenttimestamps for its own update operations. Moreover,super-peers do not need to maintain a membership listof all replicas in the system. Due to the properties ofthe gradient topology, all super-peers are located withina connected component, and hence, every super-peereventually receives every update.

Super-peers are elected using a load-based utilitythreshold. Each peer defines its capacity as the maxi-mum number of queries it can handle at one time. Theload at a peer is defined as the number of queries thepeer is currently processing. The super-peer electionthreshold is calculated in such a way that the super-peers have sufficient capacity to handle all queries is-sued in the system. When the load in the system grows,new replicas are automatically created.

2.8 Supporting additional SOA facilities

Apart from the service registry, which needs to bepresent in a service-oriented architecture, many SOAsrely on other infrastructural facilities, such as businesstransaction services, or ranking systems, that are of-ten implemented in a centralised fashion. This sectionshows an approach to decentralise such facilities usingthe gradient topology.

Assuming two applications, A and B, where eachapplication has different peer utility requirements thatcan be encapsulated in two utility functions, U A andUB, respectively, each application defines its utilitythreshold, tA and tB, and the goal of the system isto elect and exploit super-peers p such that eitherU A(p) > tA or UB(p) > tB.

A naive approach is to generate two independentgradient overlays, using the two utility functions andthe algorithms described in the previous sections. How-ever, this would double the system overhead. A betterapproach is to combine the two utility functions intoone general utility function U and to generate onegradient overlay shared by both applications. A conve-nient way of defining such a common utility function is

U(p) = max(

U A(p), UB(p))

. (24)

This has the advantage that both, peers with high valueof U A and peers with high value of UB, have high utilityU , and hence are located in the core and can be discov-ered using gradient search. The only change requiredin the routing algorithm is that a search message, oncedelivered to a high utility peer p in the core, may haveto be forwarded to a different peer in the core, since

p either has a high value U A or UB. This last step,however, with a high probability can be achieved in onehop, since peers in the core are well-connected.

The super-peer election thresholds, tA and tB, areestimated using the same aggregation algorithm, wherethe histograms for both U A and UB are generatedthrough the same aggregation instance in order toreduce the algorithm overhead. However, a potentialproblem may appear if the two utility functions, U A andUB, have significantly different value ranges, since thecomposed utility U may be dominated by one of theutility functions. For example, if U A has values withinrange [0..1] and UB has values in range [1..100], then Uis essentially equal to UB, and searching for peers withhigh U A becomes inefficient.

One way to mitigate this problem is to define thetwo utility functions in such a way that both havethe same value ranges, e.g., [0..1]. However, thisrequires system-wide knowledge about peers. Simpletransformations or projections onto a fixed interval,for example using a sigmoid function, do not fix theproblem, since if one function has higher values thanthe other function, the same relation holds when thetransformation has been applied. A better approachis to scale one of the two utility functions using thecurrent values of the super-peer election thresholds,for example in the following way

U(p) = max

(

U A(p),tA

tBUB(p)

)

. (25)

This has the advantage that the core of the gradienttopology, determined by the threshold tA, containspeers with U A above tA and peers with UB above tB,since if U(p) > tA for a peer p then either U A(p) > tA

or UB(p) > tB.Similarly, in the general case, where a gradient topol-

ogy supports more than two applications, all utilityfunctions are scaled by their respective thresholds

U(p) = max

(

U A(p),tA

tBUB(p),

tA

tCUC(p), . . .

)

.

(26)

This way, all peers required by the higher-level ap-plications (i.e., each peer p such that U A(p) > tA orUB(p) > tB or UC(p) > tC and so on) have utility U(p)

above tA, and can be elected super-peers using thesingle utility threshold tA.

Figure 5 shows a sample gradient topology that sup-ports two different applications, A and B. Ordinary


GradientSearch

GradientSearch

Ordinary Peers

Replica Placement Threshold

Application A Super-Peers

Application B Super-Peers

X

Z

Y

Fig. 5 Super-peer election and discovery in a gradient topologysupporting two different applications A and B

peers perform gradient search to discover application Bsuper-peers. Peers X and Y locate an “A-type” super-peer in the core and their request is forwarded to a “B-type” super-peer. Peer Z discovers a “B-type” super-peer directly.

2.9 Peer bootstrap

Bootstrap is a process in which a peer obtains an ini-tial configuration in order to join the system. In P2Psystems, this primarily involves obtaining addresses ofinitial neighbours. Once a peer connects to at leastone neighbour, it can receive from this neighbour theaddresses of other peers in the system as well as otherinitialisation data, such as the current values of aggre-gates.

However, initial neighbour discovery is challengingin wide-area networks, such as the Internet, since abroadcast facility is not widely available. In particu-lar, the IP multicast protocol has not been commonlyadopted by Internet service providers due to design anddeployment difficulties [20]. Most existing P2P systemsrely on centralised bootstrap servers that maintain listsof peer addresses.

This section describes a bootstrap procedure thatconsists of two stages. In the first stage, a peer attemptsto obtain initial neighbour addresses from a local cachesaved during the previous session, for example on alocal disk. This can be very effective; Stutzbach et al.[7] analyse statistical properties of peer session timesin a number of deployed P2P systems and show that

if a peer caches the addresses of several high-uptimeneighbours, there is a high probability that some ofthese high-uptime neighbours will be on-line duringthe peer’s subsequent session. Furthermore, such abootstrap strategy is fully decentralised, as it does notrequire any fixed infrastructure, and it scales with thesystem size.

However, if all addresses in the cache are unavailableor the cache is empty, for example if the peer is joiningthe system for the first time, the peer needs to have analternative bootstrap mechanism. In the second stage,peers obtain initial neighbour addresses from a boot-strap node. The IP addresses of the bootstrap nodes areeither hard-coded in the application, or preferably, areobtained by resolving well known domain names. Thislatter approach allows greater flexibility, as bootstrapnodes can be added or removed over the course ofthe system’s lifetime. Moreover, the domain name mayresolve to a number of bootstrap node addresses, forexample selected using a round-robin strategy, in orderto balance the load between bootstrap nodes.

Each bootstrap node is independent and maintainsits own cache containing peer addresses. The cache sizeand the update strategy are critical in a P2P system,as the bootstrap process may have a strong impact onthe system topology, particularly in the case of highchurn rates. If the cache is too small, subsequentlyjoining peers have similar initial neighbours, and inconsequence, the system topology may become highlyclustered or even disconnected. On the other hand, alarge cache is more difficult to keep up to date and maycontain addresses of peers that have already left thesystem.

A simple cache update strategy is to add the ad-dresses of currently bootstrapped peers and to removeaddresses in a FIFO order. However, this strategy hasthe drawback that it generates a topology where join-ing peers are highly connected with each other, whichagain leads to a highly-clustered topology and sys-tem partitioning. A better approach is to continuously“crawl” the P2P network and “harvest” available peeraddresses. In this case, the bootstrap node periodicallyselects a random peer from the cache, obtains the peer’scurrent neighbours, adds their addresses to the cache,and removes the oldest entries in the cache. This hasthe advantage that the addresses stored in the cache areclose to a random sample from all peers in the system.

3 Evaluation

Evaluation is especially important when designing anovel P2P topology, such as the gradient topology,


since P2P systems usually exhibit complex, dynamicbehaviour that is difficult to predict a priori. Theoret-ical system analysis is difficult, and often infeasible inpractice, due to the system complexity. At the sametime, a full implementation and deployment of a P2Psystem on a realistic scale requires extremely largeamounts of resources, such as machines and users, thatare prohibitive in most circumstances. Consequently,the approach followed in this paper is simulation.

However, designing P2P simulations is also challeng-ing. The designer has to decide upon numerous systemassumptions and parameters, where the appropriatechoices or parameter values are non-trivial to deter-mine. Furthermore, dependencies between differentelements of a complex system are often non-linear, anda relatively small change of one parameter may resultin a dramatic change in the system behaviour.

Moreover, due to the large scale and complexity, P2Psystems are not amenable to visualisation techniques,as a display millions of peers, connections, and mes-sages is not human-readable. P2P simulations must con-tinuously collect and aggregate statistical informationabout the system in order to, detect topology partitions,identify bottlenecks, measure global system properties,etc. Such frequent and extensive measurements areoften computationally expensive, which adds furtherchallenges to analysing P2P systems.

3.1 Evaluation goals

In order to evaluate the gradient topology and its usagein the SOA, the behaviour of the three main algorithmsare studied: the neighbour selection algorithm, super-peer election (i.e., registry replica placement), and re-quest routing.

The neighbour selection algorithm is evaluatedthrough an analysis of the generated topology, wherethe analysed properties include the average peer de-gree (i.e., number of neighbours), clustering coefficient,average path length in the topology, and the averagepercentage of globally optimal neighbours in a peer’sneighbourhood set. The super-peer election algorithm,and indirectly the aggregation algorithm, are evaluatedin a simulation run by measuring the average differ-ence between the desired and the observed numbersof super-peers in the system, the average number ofswitches between super-peers and ordinary peers, andthe total capacity, utilisation and load of super-peers.Finally, the performance of the routing algorithms onthe gradient topology is studied by measuring the av-erage request hop count and average failure rate (i.e.,percentage of request messages that are lost) in a simu-lation run.

The algorithms are run in a number of differentexperiments that examine the impact of relevant systemparameters on the system performance, such as thenumber of peers, churn rate, average load, and super-peer thresholds. The evaluation shows that the gradi-ent topology scales to a large number of peers and isresilient to high peer churn rates.

For the interested reader, a further, more compre-hensive evaluation of the gradient topology can befound in [21]. In particular, [21] compares a numberof state-of-the-art super-peer election techniques, andshows that the aggregation-based election used in thispaper generates higher-quality super-peer sets, accord-ing to a number of different metrics, at a similar cost,compared to the other known super-peer election algo-rithms.

3.2 System model

The gradient topology has been evaluated in a discreteevent simulator. The system consists of a set of peers,connections between peers, and messages passed be-tween peers. It is assumed that all peers are mutuallyreachable and any pair of peers can potentially establisha connection. The neighbourhood model is symmetric,as discussed earlier in Section 2.3. The maximum num-ber of neighbours for a peer at any given time is limitedto 26, however, as shown later, peers rarely approachthis limit, as the desired number of peer neighbours isset to 13 (7 for the random set and 6 for the similar set).

The P2P network is under constant churn, with peersession times determined probabilistically, following aPareto distribution. While the paper describes a peerleave procedure, it is hard to estimate how many peersin a real-world system would perform the procedurewhen leaving. For that reason, the worst case scenariois assumed in the experiments, where no peers performthe leave procedure. Joining peers are bootstrappedby a centralised server, which provides addresses ofinitial neighbours. The server obtains these addressesby “crawling” the P2P network and maintaining a FIFObuffer with 1,000 entries. The bootstrap server is alsoused for initiating aggregation instances.

The peer churn rate in the experiments is carefullycalculated. In a number of independent measurements[4, 5], median peer session time has been reported asbeing between 1 minute and 1 hour. A good summaryof median session durations in P2P system is given in[8]. In a more recent report [7], mean session timesrange from about 30 min in Gnutella, through ap-proximately 20 min in Kademlia, to about 2–5 min inBitTorrent. In order to be consistent with these real-world measurements, the mean peer session time in the


experiments in this paper is set to 10 min. Assuminga time step of 6 seconds, this corresponds to a meansession time of 100 time steps and a churn rate of 0.7%peers per time step (0.11% peers per second).

While session time distributions are highly-skewedin existing P2P systems, there is no general consensuswhether these distributions are heavy-tailed and whichmathematical model best fits the empirically observedpeer session times. Sen and Wong [4] observe a heavy-tailed distribution of the peer session time, however,Chu et al. [22] suggest a log-quadratic peer session timedistribution, while Stutzbach and Rejaie [7] suggest theWeibull distribution. Moreover, Stutzbach and Rejaiediscovered that the best power-law fit for the peersession times in a number of BitTorrent overlays hasan exponent whose value is between 2.1 and 2.7, andtherefore the distributions are not heavy-tailed. In theexperiments in this paper, the peer session times are setaccording to the Pareto distribution with a median of10 min and exponent 2.0 (which is border case betweenheavy-tailed and non-heavy-tailed distributions).

3.3 Service registry simulation

The service registry is maintained by super-peerselected using the adaptive thresholds. The capacityvalue C(p) determines the maximum number of re-quests a peer p can simultaneously handle if electedsuper-peer and hosting a registry replica. The loadat peer p, denoted L(p), is defined as the numberof requests currently being processed at peer p. Thecapacity values are assigned to peers according to thePareto distribution with exponent of 2 and averagevalue of 1, which models peer resource heterogeneityin the system. Moreover, peer utility is defined as

U(p) = C(p) · log(

Up(p))

(27)

where the capacity is weighted by the peer’s currentuptime in order to promote stable peers. As discussedin Section 2.2, this utility metric is fully predictable.

At every step, each peer p in the system emitsa search request with probability Preq(p). ProbabilityPreq(p) follows the Pareto distribution between peerswith exponent 2 and average value Preq = 0.01. Peersthat generate more traffic correspond to the so called“heavy hitters” in the P2P system.

Request routing is performed in two stages. First,a newly generated request is routed using Boltzmannsearch with low temperature T = 0.5 steeply to the coreuntil it is delivered to a super-peer. In the second stage,the request is forwarded between super-peers in thecore until it is delivered to super-peer s that has enoughfree capacity to handle the request (i.e., C(s) − L(s) ≥

1). Once super-peer s accepts the requests and startshandling it, its load is increased by one. When therequest processing finishes, the load at the super-peeris reduced by one.

Forwarding between super-peers is probabilistic. Asuper-peer p forwards the request to one of its neigh-bours, q, such that U p(q) > t, where t is the currentsuper-peer election threshold, with probability P

′p(q)

proportional to q’s capacity

P′p(q) = C(q)

∑

U p(x)>t C(x). (28)

The bias towards high capacity neighbours improvesthe load balancing property of the routing algorithm.If no neighbour q exists such that U p(q) > t, the re-quest is routed randomly. Every request has a time-to-live value, initialised to TT Lreq = 30, and decrementedeach time a request is forwarded between peers. Thus,a request message can be lost when its time-to-livevalue drops to zero or when the peer that is currentlytransmitting it leaves the system.

3.4 Maintenance cost

At every time step, each peer executes the neighbourselection, aggregation, super-peer election, and mes-sage routing algorithms. A peer sends on average 4neighbour selection messages per time step (a requestand response for Sp and similarly a request and re-sponse for Rp) and less than 4 aggregation messages pertime step (2 request messages and 2 response messages,since for F = 25 and TT L = 50 a peer participateson average in less than 2 aggregation instances, asexplained in Section 2.5). The election algorithm doesnot generate any messages. It can be shown that thesize of both the neighbour selection and aggregationmessages is below 1KB, and therefore, for the basictopology maintenance, a peer sends less than 8KB ofdata per time step. Given a time step of 6 seconds,this corresponds to an average traffic rate of 1.25KB/s.Moreover, this cost is independent of the system sizeand the churn rate, since the aggregation and neighbourselection algorithms are executed at a fixed periodicityand always generate the same number of messages pertime step. However, the cost associated with gradientsearch depends on the rate of requests and the size ofrequest messages, and hence is application-specific.

3.5 Topology structure

This section evaluates the neighbour selection algo-rithm by analysing the generated overlay topology.The evaluation is based on a set of experiments. Each


experiment begins with a network consisting of a singlepeer, and the network size is increased exponentiallyby adding a fixed percentage of peers at each time stepuntil the size grows to 100, 000 peers. At the followingtime steps, the system is under continuous peer churn,however, the rate of arrivals is equal to the rate ofdepartures and the system size remains constant.

The following notation and metrics are used. Thesystem topology T is a graph (V, E), where V is the setof peers in the system, and E is the set of edges betweenpeers determined by the neighbourhood sets: (p, q) ∈E if q ∈ Sp ∪ Rp. The graph is undirected, since theneighbourhood relation is symmetric and if q ∈ Sp ∪Rp then p ∈ Sq ∪ Rq. Similarly, sub-topologies TS =(V, ES) and TR = (V, ER) are defined based on thesimilarity and random neighbourhood sets, Sp and Rp,accordingly, where (p, q) ∈ ES if q ∈ Sp, and (p, q) ∈ER if q ∈ Rp.

Figure 6 shows the average peer degree distributionin four systems with 100, 000 peers and different churnrates, where each plotted point represents the totalnumber of peers in the system with a given neighbour-hood size. The graph has been obtained by runningfour experiments with different churn rates, each for2, 000 time steps, generating peer degree distributionsevery 40 time steps, and averaging the sample sets atthe end of each experiment in order to reduce thestatistical noise. The same procedure has been appliedto generate all the remaining graphs in this subsection.

The obtained degree distributions resemble a normaldistribution, where majority of peers have approxi-mately 13 neighbours, as desired. Moreover, the distrib-utions are nearly identical for all churn rates, suggestinggood resilience of the neighbour selection algorithm topeer churn.

0

5000

10000

15000

20000

0 5 10 15 20 25

Num

ber

of p

eers

Degree

No churnMedian session 20minMedian session 10minMedian session 5min

Fig. 6 Peer degree distribution in four systems with differentchurn rates

Vr is defined as a subset of peers in the system, Vr ⊂V, that contains r highest utility peers. Formally,

Vr = {

p ∈ V | U(p) ≥ U(pr)}

(29)

where pr is the rth highest utility peer in the system.In order to investigate the correlation between peerdegree and peer utility, the average peer degree iscalculated for a number of Vr sets in T, TS and TR.Figure 7 shows the results of this experiment. The plotsare nearly flat, indicating that the average number ofneighbours is independent from the peer utility, and inparticular, the highest utility peers are not overloadedby excessive connections from lower utility peers. Theslight increase in the degree of the 12 highest utilitypeers is caused by the fact that these peers cannot findany higher utility neighbours, and hence, connect tolower utility peers, generating a locally higher averagedegree.

Similarly, Fig. 8 shows the clustering coefficient intopologies T, TS and TR for a number of Vr set withincreasing utility rank r. In the TS topology, the coeffi-cient gradually grows as peer utility increases, almostreaching the value of 0.8 for r = 1, which indicatesthat the highest utility peers in the system are highlyconnected with each other and constitute a “core” inthe network. At the same time, the coefficient is nearlyconstant in TR, since the preference function for therandom sets is independent of peer utility.

Given global knowledge about the system in a P2Psimulator, the optimal neighbourhood set S∗

p for eachpeer p can be determined using the neighbour pref-erence function defined by formulas (2) and (3) inSection 2.3. Formally, S∗

p is a subset of all peers inthe system, S∗

p ⊂ V, such that min(S∗p) ≥ max(V\S∗

p),where the ≥ relation is defined in formulas (2) and (3).

0

5

10

15

20

1 10 100 1000 10000 100000

Ave

rage

deg

ree

Utility rank

TotalSimilar

Random

Fig. 7 Average sizes of neighbourhood sets for highest utilitypeers


0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000

Clu

ster

ing

coef

ficie

nt

Utility rank

TotalSimilar

Random

Fig. 8 Clustering coefficients of highest utility peers in sub-topologies determined by similarity and random neighbourhoodsets

The quality of a peer neighbourhood set Sp can be thenestimated using Opt(p) metric defined as the portion ofoptimal entries in Sp

Opt(p) = |Sp ∩ S∗p|

|S∗p|

. (30)

Consequently, Optavg(Vr) is the average value ofOpt(p) for the r highest utility peers in the system.

Figure 9 shows the value of Optavg(Vr) as a functionof the utility rank r in four experiments with differentchurn rates. The graph shows that while the averagevalue of Opt() in the entire system is very low, as it isrelatively unlikely for peers to discover their globallyoptimal neighbours in a large-scale dynamic system,Opt(p) grows with peer utility and reaches its max-imum value of 1 for the highest utility peers. Thisconfirms that the highest utility peers are stable enough

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000 100000

Opt

imal

nei

ghbo

urs

Utility rank

No churnMedian time 20minMedian time 10minMedian time 5min

Fig. 9 Average fraction of globally optimal neighbours for peersof different utility ranking

and have long enough session times to fully optimisetheir neighbourhood sets. Thus, the topology consistsof a stable “core” of the highest utility peers, whichmaintain the topology structure, and lower utility peersthat are subject to heavy churn and have a reducedability to optimise their neighbourhood sets.

Furthermore, the highest utility peers manage tomaintain close-to-optimal neighbourhood sets in all ex-periments with median peer session times ranging frominfinity (no churn) to 5min.

In order to get more insight into the structure of thegradient topology, the subsequent experiments mea-sure the average path length between highest utilitypeers in the system. D(p, q) is defined as the short-est path length between peers p and q in the systemtopology T, and analogously, DS(p, q) and DR(p, q)

are defined for TS and TR. The average path lengthin T, denoted Apl(V), is the average value of D(p, q)

over all possible pairs of peers (p, q) in the system

Apl(V) =∑

p,q∈V D(p, q)

|V|2 . (31)

Given a utility rank r, the average path length betweenthe r highest utility peers is Apl(Vr). Furthermore,AplS and AplR are again defined as the average pathlengths in TS and TR, respectively.

The average path length Apl(V) can be calculatedusing the Dijkstra shortest path algorithm at O(|V|2d)

cost, where d is the average peer degree in V. How-ever, in the system described in this paper, with |V| =100, 000 and d = 13, this would require performing over100, 000, 000, 000 basic operations. This cost can bereduced by selecting a random subset V ′ from V andapproximating Apl(V) with

Apl′(V) =

∑

p∈V ′∑

q∈V D(p, q)

|V ′| · |V| . (32)

Such approximation requires running the Dijkstra al-gorithm for |V ′| peers, and hence, incurs the compu-tational cost of O(|V ′||V|d) operations. In practice,|V ′| = 100 generates accurate results.

In the unlikely case where two peers p and q are notconnected in the system topology, the distance D(p, q)

is not defined and the (p, q) pair is omitted in the calcu-lation of Apl′. The number of such pairs is extremelylow in the reported experiments and such pairs onlyoccur when a peer becomes isolated and needs to bere-bootstrapped. With the exception of isolated peers,topology partitions were never observed in any of theexperiments described in this paper.

Figure 10 shows the average path length between thehighest utility peers in the system. Each point plottedin the graph represents the value of Apl

′(Vr) for a


0

1

2

3

4

5

6

7

8

1 10 100 1000 10000 100000

Ave

rage

pat

h le

ngth

Utility Rank

No churnMedian time 20minMedian time 10min

Median time 5min

Fig. 10 Average path length between peers of different utilityranking

given utility rank r. For all churn rates, the average pathlength gradually converges to zero when decreasing r.This confirms the emergence of a gradient structure inthe system topology, where high utility peers, deter-mined by a utility threshold, are closely connected.

3.6 Aggregation

This section evaluates the accuracy of the aggregationalgorithm. The following notation and metrics are used.Variables Np,t, Hp,t and Hc

p,t denote the current esti-mations at peer p of the current system size, N, utilityhistogram, Ht, and capacity histogram Hc

t , respectively,at time step t. The average relative error in the systemsize approximation, calculated over all time steps andpeers in the system, is defined as

ErrN = 1

Time

Time∑

t=1

1

N

∑

p

|Np,t − N|N

. (33)

where Time is the experiment duration. Similarly, theaverage error in utility histogram estimation, ErrH , isdefined as

ErrH = 1

Time

Time∑

t=1

1

N

∑

p

d(Ht, Hp,t) (34)

where d is a histogram distance function defined as

d(Ht, Hp,t) = 1

B

B−1∑

i=0

|Ht(i) − Hp,t(i)|Ht(i)

. (35)

Analogously, ErrHc is defined as the average error inthe capacity histogram estimation.

Figure 11 shows the values for ErrN , ErrH and ErrHc

in two sets of experiments. In the first set, labelled

1e-06

1e-05

0.0001

0.001

0.01

0.1

1

N H H^c

App

roxi

mat

ion

erro

r

ChurnNo-Churn

Fig. 11 Influence of churn on aggregation error

“Churn”, nodes are allowed to join and leave the over-lay, as explained in Section 3.2. In the second set ofexperiments, labelled “No-Churn”, the population ofnodes is static.

In the absence of churn, the aggregation algorithmproduces almost perfectly accurate system propertyapproximations, with the average error below 0.001%.This behaviour is consistent with the theoretical andexperimental analysis described in [15]. In the pres-ence churn, the observed error is non-negligible, andis approximately equal to 3% for N and 10% for thehistograms. The next section evaluates the influence ofchurn and aggregation error on the super-peer election.

There are two parameters that control the cost andaccuracy of aggregation, which are the frequency of in-stance initiation, F, and an instance time-to-live, TT L.Additionally, the histogram resolution, B, impacts onthe accuracy of utility distribution approximation.

When F is decreased, peers perform aggregationmore frequently, and have more up-to-date estimationsof the system properties. However, in the describedexperiments, the system size and the probability distri-butions of peer utility and capacity are constant, andhence, running aggregation more often does not affectthe results. At the same time, when F is decreased, theaverage message size increases, since peers participatein a higher number of aggregation instances.

Similarly, when the TT L parameter is increased,aggregation instances last longer, peers store more lo-cal tuples, and aggregation messages become larger.However, as shown in Fig. 12, if the TT L parameteris too low (e.g., equal to 30), the aggregation instancesare too short to average out the tuples stored by peersand the results have a high error. Conversely, whenaggregation instances run longer, they suffer more from


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

N H H^c

App

roxi

mat

ion

erro

r

TTL=30TTL=60

TTL=120TTL=240

Fig. 12 Aggregation error versus instance TTL

churn (more tuples are lost during an instance) and thequality of results gradually deteriorates. It appears thatoptimum performance is achieved for TT L ≈ 50, andthis value for TT L is used in the experiments describedin this article.

Finally, the accuracy of utility distribution approx-imation can be improved by increasing the histogramresolution, B. Clearly, the message size grows linearlywith the number of histogram bins. The actual accuracyimprovement depends on the shape of the distributionfunction and the histogram interpolation method. Inthis article, linear interpolation is used and histogramshave 100 bins. This way, aggregation messages have be-low 1kB, and would fit well into UDP packets, assumingthis protocol was used in the implementation.

3.7 Super-peer election

The following section evaluates the super-peer electionalgorithm for the registry replica placement. A num-ber of experiments is performed. In each experiment,the P2P network initially consists of one peer and isgradually expanded until it grows to N peers, as inthe previous section. Super-peers are elected using twoproportional thresholds, an upper threshold tu and alower threshold tl, such that tu = tQ, where Q is thedesired super-peer ratio in the system, and tl = tQ+�,where � determines the distance between upper andlower thresholds. At any time t, Mt denotes the cur-rent number of super-peers in the system, Mt

N is thecurrent super-peer ratio, and Errt = |Mt − QN|, calledalgorithm error, is the difference between the electedand the desired numbers of super-peers in the system,which reflects the election algorithm accuracy. Simi-larly, RErrt = Errr

QN is the relative election error.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

5 10 15 20 25 30

Sup

er-p

eer

ratio

Median session time

Q=0.01Q=0.03

Q=0.1

Fig. 13 Average super-peer ratio observed in the system

As the network grows to N peers, the system isrun for 2, 000 time steps and results are aggregated.M denotes the average number of super-peers in thesystem over all time steps, Err is the average error, andRErr is the relative algorithm error.

The first set of experiments investigate the impactof churn on the super-peer election. Figure 13 showsthe average super-peer ratio M

N in systems with 50, 000peers, where Q is set to 0.01, 0.03 and 0.1, and themedian peer session duration is ranging between 5minand 30min. Figure 14 shows the average error RErrin the same experiment. As expected, the accuracy ofthe election algorithm degrades when the churn rateincreases (i.e., for shorter peer sessions), since churnaffects the aggregation algorithm, causing larger errorin the generated aggregates and in the threshold calcu-lation. However, in all cases, the observed super-peer

0

0.02

0.04

0.06

0.08

0.1

5 10 15 20 25 30

Rel

ativ

e er

ror

Median session time

Q=0.01Q=0.03

Q=0.1

Fig. 14 Average error in the number of super-peers elected as afunction of churn rate


0

5000

10000

15000

20000

100,00080,00060,00040,00020,000

Req

uest

s

Number of peers

LoadCapacity for W=0.9

Capacity for W=0.75Capacity for W=0.5

Fig. 15 Total load and super-peer capacity in systems with variedsizes and adaptive thresholds

ratio is close to Q, and the average error is boundedduring simulation at 5%, which shows that peers’ localestimations of the super-peer election thresholds areclose to the desired values.

Moreover, it should be noted that the calculatedpeer session durations and the churn rates are basedon the assumption that the time step is 6 seconds long.The system can achieve better churn tolerance eitherwhen peers run periodic algorithms more frequentlyand exchange more messages (i.e., the time step isshortened).

In the next experiments, super-peers are electedusing adaptive thresholds tW , where W is the desiredsuper-peer utilisation, with the upper threshold tu = tW

and lower threshold tl = tW−�. Figures 15 and 16 showthe total system load, super-peer capacity, and the av-erage super-peer utilisation in a number of experimentswith the system size N ranging from 10,000 to 100,000

0

0.2

0.4

0.6

0.8

1

100,00080,00060,00040,00020,000

Sup

er-p

eer

utili

satio

n

Number of peers

W=0.9W=0.75

W=0.5

Fig. 16 Average super-peer utilisation as a function of systemsize

and W set to 0.9, 0.75, and 0.5. The median peer sessionlength is fixed at 10min and � = 0.01. It can be seenthat the systems exhibit stable behaviour, with the totalsuper-peer capacity growing linearly with the systemsize and proportionally to the system load, while theaverage super-peer utilisation remains at a constantlevel, relatively close to W.

3.8 Dynamic peer utility

In the previous sections, it has been assumed that peercapacity is constant. However, this assumption may notalways be realistic, as resources such as storage capac-ity, network bandwidth, processor cycles and memory,can be consumed by external applications, reducing thecapacity perceived by the peer. Furthermore, a peermay be unable to determine its capacity precisely, andmay have to rely on local measurements or heuristicsthat incur an estimation error.

In the following experiments, each peer p has aconstant maximum capacity value, C∗(p), and a currentcapacity value, C(p), determined at each time step byformula C(p) = C∗(p) · (1 − ε), where ε is randomlychosen between 0 and εmax. Thus, the ε parameter canbe seen either as the peer capacity estimation error orthe interference of external applications.

Each experiment is set up with three parameters: thecapacity change amplitude, εmax, labelled “Epsilon” onthe graphs, the desired super-peer utilisation, W, andthe difference between the upper and lower thresholds,�. In order to prevent super-peers close to the elec-tion threshold from frequently switching their status toordinary peers and conversely, super-peers are electedusing two utility thresholds, where again tu = tW andtl = tW−�.

0

0.02

0.04

0.06

0.08

0.1

0 0.05 0.1 0.15 0.2

Sup

er-p

eer

chan

ges

Delta

Epsilon=0.0Epsilon=0.1Epsilon=0.2

Fig. 17 Relative number of super-peer changes per time step asa function of �


Figure 17 shows the average number of super-peerchanges as a function of �. The experiment demon-strates that the number of changes sharply decreasesas � is increased. However, it does not converge tozero, but rather to a constant positive value. This iscaused by the fact that some super-peers always leavethe system, due to churn, and ordinary peers mustcontinuously switch to super-peers in order to maintainenough capacity in the core.

Hence, super-peer changes are due to two reasons.First, as super-peers leave the system, ordinary peersneed to replace them and switch their status to super-peers. The number of such switches can be simplydetermined by counting super-peers leaving the system,and is labelled “Churn” in the graphs. Secondly, boththe utility of individual peers and the utility thresholdconstantly fluctuate, due to changes in the system load,peer departures and arrivals, errors in the aggregationalgorithm, etc., which causes peers with utility close tothe election threshold to occasionally change their sta-tus. The latter category of changes is labelled “Thresh-old” in the graphs.

Figure 18 shows the number of super-peer changesdivided between the two categories in a system withεmax = 0.1. The experiment demonstrates that the num-ber of super-peer changes caused by utility and thresh-old fluctuations can be reduced to a negligible level byusing an appropriate �.

Figure 19 shows the impact of � on the super-peerelection error. As expected, the error grows togetherwith �, since a larger gap between tu and tl relaxes theconstraints on the number of super-peers in the system.The less precise restriction of the number of super-peers in the system is the price for the reduction of thesuper-peer switches.

0

0.01

0.02

0.03

0.04

0.05

0 0.05 0.1 0.15 0.2

Sup

er-p

eer

chan

ges

Delta

TotalChurn

Threshold

Fig. 18 Relative number of super-peer changes due to super-peer departures and threshold fluctuations

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.005 0.01 0.015 0.02

Rel

ativ

e er

ror

Delta

Epsilon=0.0Epsilon=0.1Epsilon=0.2

Fig. 19 Relative error in the number of super-peers elected inthe system as a function of �

3.9 Routing performance

The following section evaluates the routing algorithmused by peers to access SOA registry replicas. As de-scribed previously, requests are generated by peers withaverage probability Preq and are routed to availableregistry replicas hosted by super-peers. In a number ofexperiments, two properties are measured: the averagerequest hop count and the average request failure rate.Furthermore, these two parameters, hop counts andfailure rates, are calculated for requests routed betweenordinary peers, before delivered to a super-peer in thecore (labelled “Outside core” on the graphs), and forrequests routed in the core, when searching for a super-peer with available capacity (labelled “Inside core” onthe graphs).

Figure 20 shows the average request hop count as afunction of median peer session time. The hop count

0

2

4

6

8

10

12

14

5 10 15 20 25 30

Hop

s

Median session time

W=0.9W=0.75W=0.5

Fig. 20 Average request hop count as a function of churn rate


increases when peer sessions are shorter, which indi-cates that the topology structure degrades when churnrate increases, reducing the routing performance. Fur-thermore, the request hop count is significantly higherfor W = 0.9 than for W = 0.5 and W = 0.75, which isdue to two reasons. First, in systems with higher super-peer utilisation W, fewer super-peers are elected andhence it is harder for ordinary peers to discover thesuper-peers. Secondly, in systems with higher super-peer utilisation, requests are forwarded more timesbetween super-peers in the core, as it is less likely todiscover a super-peer with spare capacity.

This observation suggests that routing is generallymore efficient in systems with lower super-peer util-isation. However, with lower W, a larger number ofsuper-peers are elected, which increases the replicamaintenance cost, since more data needs to be migratedover the network in order to create and synchronise thereplicas. Consequently, the adaptive threshold enablesa trade-off between the replica discovery cost and thereplica maintenance cost.

Figure 21 shows the impact of churn on the requestfailure rate. As expected, the number of failures growswhen the churn rate is increased, and similarly as thehop count, the failure rate is significantly higher forW = 0.9 than for W = 0.75 and W = 0.5.

Figure 22 shows the average request hop count asa function of the system size. The results show thatthe hop count does not grow significantly within theinvestigated range of 10, 000–100, 000 peers, which canbe explained by the fact that the hop count dependsmainly on the super-peer ratio in the system, whichis constant with the system size, and is determined byW. Figure 23 shows the average request failure rate inthe same experiment. The results indicate good system

0

0.05

0.1

0.15

0.2

0.25

5 10 15 20 25 30

Fai

lure

rat

e

Median session time

W=0.9W=0.75

W=0.5

Fig. 21 Request failure rate as a function of peer churn rate

0

2

4

6

8

10

12

14

100,00080,00060,00040,00020,000

Hop

s

Number of peers

W=0.9W=0.75W=0.5

Fig. 22 Average request hop count as a function of system size

scalability, as both the hop count and failure rate areconstant with the number of peers in the system.

3.10 Impact of Boltzmann temperature

The following set of experiments investigates the im-pact of the Boltzmann temperature Temp on the per-formance of request routing and the distribution ofrequests between super-peers. Figure 24 shows the av-erage request hop count outside the core (before arequest is delivered to a super-peer) as a function ofthe temperature Temp. The Temp = 0 case representsgradient search. The hop count grows steadily withthe temperature, and the best routing performance isachieved with the lowest temperature. This justifies theusage of greedy routing (i.e., gradient routing) outsidethe core.

0

0.05

0.1

0.15

0.2

0.25

100,00080,00060,00040,00020,000

Fai

lure

rat

e

Number of peers

W=0.9W=0.75

W=0.5

Fig. 23 Request failure rate as a function of system size


0

2

4

6

8

10

12

0 0.5 1 1.5 2

Hop

s

Boltzmann temperature

Threshold=0.9Threshold=0.75Threshold=0.5

Fig. 24 Average number of request hops outside core as afunction of Boltzmann temperature for three different super-peerelection thresholds

Figure 25 shows the average number of request hopsinside the core as a function of Temp. Unlike in theprevious experiment, better performance is achievedfor higher temperatures. It should be noted that whilerequests are forwarded between super-peers using for-mula (28), which is independent of Temp, Boltzmanntemperature impacts on the delivery of requests tosuper-peers, and hence may affect routing in the core.This is confirmed by the experimental results; the aver-age request hop count in the core decreases when Tempgrows, which indicates that the load is distributed moreequally between super-peers for higher Temp.

Thus, the temperature parameter enables a trade-offbetween greedy routing, which delivers request quicklyto the core, and randomised routing that improves load

0

2

4

6

8

10

12

0 0.5 1 1.5 2

Hop

s


Threshold=0.9Threshold=0.75Threshold=0.5

Fig. 25 Average number of request hops inside core as a functionof Boltzmann temperature or three different super-peer electionthresholds

0

2

4

6

8

10

12

14

0 0.5 1 1.5 2

Hop

s


TotalGradient

Core

Fig. 26 Average number of request hops inside and outside coreas functions of Boltzmann temperature

balancing. This is further illustrated in Fig. 26, whichcompares the request hop count inside the core andoutside the core for W = 0.75 and 0 ≤ Temp ≤ 2 , andin Fig. 27, which shows the average request failure ratein the same experiment. Both figures suggest that theoptimal temperature for routing is close to 0.5.

3.11 Impact of system load

The final set of experiments investigates the perfor-mance of the system under variable load conditions.Figure 28 shows the relationship between the requestprobability Preq and the total number of super-peers inthe system. It can be seen that the super-peer set adaptsto the increasing load in the system. The super-peerratio initially grows slowly, as high capacity super-peers

0

0.05

0.1

0.15

0.2

0 0.5 1 1.5 2

Fai

lure

rat

e


TotalGradient

Core

Fig. 27 Average request failure rate inside and outside core asfunctions of Boltzmann temperature


0

0.2

0.4

0.6

0.8

1

0 0.01 0.02 0.03 0.04

Sup

er-p

eer

ratio

Request probability

W=0.75W=0.5

Fig. 28 Super-peers ratio as a function of request probability

are available, but the growth rate quickly increasesfor higher Preq, and eventually all peers in the systembecome super-peers.

Figure 29 shows the average super-peer capacity andthe total system load in the same experiment. The figuredemonstrates that the super-peer capacity scales lin-early with the load in the system, which is proportionalto Preq, until all peers in the system are fully utilised.

Figure 30 shows the average request hop count asa function of the request probability Preq. Remark-ably, the hop count is high for both low Preq andhigh Preq, while achieving its minimum for Preq ≈ 0.02.For low Preq, the high number of hops is caused bythe fact that very few (potentially zero) super-peersare elected when the system load is very low, andhence, it is hard for peers to discover the super-peers.On the contrary, when the load is very high, a large

0

5000

10000

15000

20000

25000

0 0.01 0.02 0.03 0.04

Req

uest

s

Request probability

LoadCapacity for Q=0.75

Capacity for Q=0.5

Fig. 29 System load and super-peer capacity as functions ofrequest probability

0

2

4

6

8

10

12

14

16

0 0.01 0.02 0.03 0.04

Hop

s

Request probability

TotalOutside core

Inside core

Fig. 30 Average request hop count as a function of system load

number of peers (potentially all peers in the system)are elected super-peers, and the task of load balancingbetween super-peers becomes very hard. In the lattercase, the performance of routing is mainly determinedby the load balancing algorithm. A similar effect canbe observed when measuring the request failure rate,as shown in Fig. 31. A high percentage of requestsare lost when Preq is very low or very high, whilethe lowest failure rate is reached for Preq close to0.015.

An important conclusion from these experiments isthat the system should always maintain a minimumnumber of super-peers, even in the presence of noload, in order to reduce the request hop count andfailure rate. This can be accomplished by combiningthe adaptive threshold with the top-K or proportionalthreshold.

0

0.1

0.2

0.3

0.4

0.5

0.6

0 0.01 0.02 0.03 0.04

Fai

lure

rat

e

Request probability

TotalOutside core

Inside core

Fig. 31 Request failure rate as a function of system load


4 Related work

An approach to web service discovery that uses a de-centralised search engine, based on a P2P network,is described in [23]. In this approach, services arecharacterised by keywords and positioned in a multi-dimensional space that is mapped onto a DHT and par-titioned between peers. A similar approach, describedin [24] and [25], partitions the P2P system into a setof ontological clusters, using a P2P topology basedon hypercubes, in order to efficiently support complexRDF-based search queries. However, both these ap-proaches are based on P2P networks that do not reflectpeer heterogeneity in the system, unlike the gradienttopology, and do not address the problem of high utilitypeer discovery in a decentralised P2P environment.

A number of general search techniques have beendeveloped for unstructured P2P systems (e.g., [26] and[27]), however, these techniques do not exploit anyinformation contained in the underlying P2P topology,and hence achieve lower search performance than thegradient heuristic that takes advantage of the gradienttopology structure [16]. Morselli et al. [28] proposed arouting algorithm for unstructured P2P networks thatis similar to gradient searching, however, they addressthe problem of routing between any pair of peers ratherthan searching for reliable peers or services.

In traditional super-peer topologies, the super-peersform their own overlay within the existing P2P system,while ordinary peers are connected to one or moresuper-peers. Kazaa [29], Gnutella [30], and Skype [31]are examples of such systems deployed over the Inter-net. Yang and Garcia-Molina [32] give general princi-ples for designing such super-peer networks. However,nearly all known P2P systems lack an efficient, decen-tralised super-peer election algorithm. Traditional elec-tion algorithms, such as the Bully algorithm [33], andother classical approaches based on group communica-tion [34], cannot be applied to large-scale P2P systems,as they usually require agreement and message passingbetween all peers in the system. In many P2P sys-tems, super-peers are selected manually, through someout-of-band or domain-specific mechanism. Often, thesuper-peer set is managed centrally, for example bythe global system administrator or designer, and oftenstatically configured (hard-coded) into the system. Inother cases, super-peers are elected locally using sim-ple heuristics. These approaches, both centralised anddecentralised, often select a suboptimal set of super-peers due to the lack of system-wide knowledge ofpeer characteristics [21]. This paper describes a moreelaborate approach, where the super-peer election is

based on the aggregation of system-wide peer utilitycharacteristics.

Xiao and Liu [35] propose a decentralised super-peermanagement architecture, similar to the one describedin this paper, that focuses on three fundamental ques-tions: what is the optimal super-peer ratio in the system;which peer should be promoted to super-peers; andhow to maintain an optimal super-peer set in a dynamicsystem. To this end, they introduce the peer capacityand session time metrics, similarly as in the gradienttopology, and they aim to elect super-peers with glob-ally highest capacity and stability in the system. How-ever, their approach uses relatively simple, localisedheuristics at each peer in order to estimate system-wide peer characteristics, in contrast to the aggregationalgorithms used in this paper. Furthermore, their ar-chitecture does not use double election thresholds thatreduce the number of swappings between super-peersand ordinary peers, and they do not address varyingload and peer capacity.

Montresor [36] proposes a self-organising protocolfor super-peer overlay generation that maintains a bi-nary distinction between super-peers and client peers.The algorithm attempts to elect a minimum-size super-peer set with sufficient capacity to handle all clientpeers in the system. This approach has been furtherextended in [37], where the super-peer election algo-rithm not only attempts to minimise the total numberof super-peers in the system, but also imposes a limiton the maximum latency between super-peers and theirclient peers. In contrast, the gradient topology intro-duces a continuous peer utility spectrum and a gradientstructure that enables super-peer election based onadaptive utility thresholds. Furthermore, the gradienttopology allows the partitioning of peers into a con-figurable hierarchy, where each level of the hierarchyconsists of peers whose utility values fall within thesame utility range, as in [14].

The task of data aggregation, or synopsis construc-tion, has been well-studied in the past in the areasof sensor networks [38, 39] and distributed databases[40, 41]. Most of the proposed algorithms rely on dis-semination trees, where the aggregated data is sent toa single node. However, in the architecture describedin this paper, all nodes need to estimate global systemproperties in order to decide on the super-peer election.

Kempe et al. [42] describe a push-based epidemicalgorithm for the computations of sums, averages, ran-dom samples, and quantiles, and provide a theoreticalanalysis of the algorithm. Their algorithm has beenused for the histogram and utility thresholds calcula-tion in [43]. However, Montresor et al. [15] introduce


a push-pull aggregation algorithm that offers betterperformance, compared to push-based approaches, insystems with high churn rates. This paper extends thepush-pull aggregation algorithm by enabling the calcu-lation of utility and capacity histograms and by adding apeer leave procedure that further improves the behav-iour of the algorithm in the face of peer churn.

The approach to decentralise a service-oriented ar-chitecture, described in this paper, has been initiallyproposed in [44]. The gradient search and Boltzmannsearch heuristics have been first proposed in [16], andthe super-peer election thresholds have been intro-duced in [43]. However, compared with [44], [16] and[43], the algorithms presented in this paper have beensubstantially elaborated and improved. In particular,the neighbour selection algorithm has been extended,the push-based aggregation algorithm has been re-placed by a push-pull algorithm, a new super-peerelection approach based on system load and adap-tive thresholds has been introduced, an approach tomultiple utility functions support has been added, abootstrap mechanism has been described, and mostimportantly, a substantially more elaborate evaluationhas been performed.

5 Conclusions

This paper describes an approach to fully decentralisea service-oriented architecture using a self-organisingpeer-to-peer network maintained by service providersand consumers. While the service provision and con-sumption are inherently decentralised, as they usuallyinvolve direct interactions between service providersand consumers, the P2P infrastructure enables the dis-tribution of a service registry, and potentially otherSOA facilities, across a number of sites available in thesystem.

The most interesting element of the presented ap-proach is the gradient topology, which pushes the stateof the art of super-peer election algorithms by usingaggregation techniques to estimate system-wide peerproperties. The gradient topology allows peers to con-trol and dynamically refine and optimise the super-peer set by adjusting the super-peer election threshold;this is, as the authors believe, an important propertyfor super-peer systems in dynamic environments. Fur-thermore, the approach allows the margin around thesuper-peer threshold to be configurable, which reducesthe impact of random utility fluctuations on super-peer

stability. This decreases the system overhead associatedwith creating or migrating super-peers.

The experimental evaluation of the gradient topol-ogy shows that a system consisting of 100, 000 peersmaintains the desired structure in the presence of heavychurn. Furthermore, peers successfully elect and updatea set of highest utility super-peers, maintaining a totalsuper-peer capacity proportional to the system load.The election algorithm can also reduce the frequencyof switches between super-peers and ordinary peers,in case of fluctuating peer utility, by applying upperand lower thresholds and relaxing the super-peer utilityrequirements. Finally, the presented routing algorithmsare robust to churn and scale to large numbers of peers,enabling efficient super-peer discovery. Load balancingcan be achieved by a Boltzmann heuristic at the cost ofrouting performance.

Acknowledgements The work described in this paper waspartly funded by the EU FP6 Digital Business Ecosystem project(DBE), Microsoft Research Cambridge, and the Irish ResearchCouncil for Science Engineering and Technology (IRCSET).

Open Access This article is distributed under the terms of theCreative Commons Attribution Noncommercial License whichpermits any noncommercial use, distribution, and reproductionin any medium, provided the original author(s) and source arecredited.

References

1. Huhns MN, Singh MP (2005) Service-oriented computing:key concepts and principles. IEEE Internet Computing9(1):75–81

2. Jammes F, Smit H (2005) Service-oriented paradigms in in-dustrial automation. IEEE Trans Ind Inf 1:62–70

3. Papazoglou MP, Georgakopoulos D (2003) Service-orientedcomputing. Commun ACM 46:24–28

4. Sen S, Wang J (2004) Analyzing peer-to-peer traffic acrosslarge networks. IEEE/ACM Trans Netw 12:219–232

5. Gummadi KP, Dunn RJ, Saroiu S, Gribble SD, Levy HM,Zahorjan J (2003) Measurement, modeling, and analysis ofa peer-to-peer file-sharing workload. In: Proceedings of sym-posium on operating systems principles, pp 314–329

6. Saroiu S, Gummadi PK, Gribble SD (2003) Measuring andanalyzing the characteristics of napster and gnutella hosts.Multimedia Syst 9(1):170–184

7. Stutzbach D, Rejaie R (2006) Understanding churn in peer-to-peer networks. In: Proceedings of the 6th ACM SIG-COMM conference on internet measurement. ACM, NewYork, pp 189–202

8. Rhea S, Geels D, Roscoe T, Kubiatowicz J (2004) Handlingchurn in a DHT. In: Proceedings of the USENIX annualtechnical conference. USENIX, El Cerrito, pp 127–140


9. Li J, Loo BT, Hellerstein JM, Kaashoek MF, Karger DR,Morris R (2003) On the feasibility of peer-to-peer webindexing and search. In: Proceedings of the 2nd internationalworkshop on peer-to-peer systems. Springer, New York,pp 207–215, LNCS 2735

10. Lakshminarayanan K, Padmanabhan VN (2003) Some find-ings on the network performance of broadband hosts. In: Pro-ceedings of the 3rd ACM SIGCOMM conference on internetmeasurement. ACM, New York, pp 45–50

11. Voulgaris S, Gavidia D, van Steen M (2005) CYCLON: inex-pensive membership management for unstructured P2P over-lays. J Netw Syst Manag 13(2):197–217

12. Jelasity M, Guerraoui R, Kermarrec A-M, van Steen M(2004) The peer sampling service: experimental evaluation ofunstructured gossip-based implementations. In: Middleware.Springer, New York, pp 79–98, LNCS 3231

13. Jelasity M, Babaoglu Ö (2006) T-man: gossip-based over-lay topology management. In: Proceedings of the 3rd inter-national workshop on engineering self-organising systems.Springer, New York, pp 1–15, LNCS 3910

14. Jelasity M, Kermarrec A-M (2006) Ordered slicing of verylarge-scale overlay networks. In: Montresor A, WierzbickiA, Shahmehri N (eds) Proceedings of the 6th IEEE inter-national conference on peer-to-peer computing. IEEE Com-puter Society, Piscataway, pp 117–124

15. Jelasity M, Montresor A, Babaoglu O (2005) Gossip-basedaggregation in large dynamic networks. ACM Trans ComputSyst 23:219–252

16. Sacha J, Dowling J, Cunningham R, Meier R (2006) Discov-ery of stable peers in a self-organising peer-to-peer gradi-ent topology. In: Proceedings of the 6th IFIP internationalconference on distributed applications and interoperable sys-tems. Springer, New York, pp 70–83, LNCS 4025

17. Sutton RS, Barto AG (1998) Reinforcement learning: an in-troduction. MIT, Cambridge

18. Patrick Reynolds AV (2003) Efficient peer-to-peer keywordsearching. In: Middleware, ser. LNCS, vol 2672. Springer,New York, pp 21–40

19. Demers A, Greene D, Hauser C, Irish W, Larson J, ShenkerS, Sturgis H, Swinehart D, Terry D (1987) Epidemic algo-rithms for replicated database maintenance. In: Proceedingsof the 6th ACM symposium on principles of distributed com-puting. ACM, New York, pp 1–12

20. Diot C, Levine BN, Lyles B, Kassem H, Balensiefen D (2000)Deployment issues for the IP multicast service and architec-ture. IEEE Netw 14(1):78–88

21. Sacha J (2009) Exploiting heterogeneity in peer-to-peer sys-tems using gradient topologies. Ph.D. dissertation, TrinityCollege Dublin

22. Chu J, Labonte K, Levine BN (2002) Availability and localitymeasurements of peer-to-peer file systems. In: Proceedingsof ITCom: scalability and traffic control in IP networks, vol4868, pp 310–321

23. Schmidt C, Parashar M (2004) A peer-to-peer approach toweb service discovery. World Wide Web 7(2):211–229

24. Schlosser M, Sintek M, Decker S, Nejdl W (2002) A scalableand ontology-based p2p infrastructure for semantic web ser-vices. In: Proceedings of the 2nd international conference onpeer-to-peer computing, pp 104–111

25. Nejdl W, Wolpers M, Siberski W, Schmitz C, Schlosser M,Brunkhorst I, Löser A (2003) Super-peer-based routing andclustering strategies for RDF-based peer-to-peer networks.In: Proceedings of the 12th international conference on worldwide web. ACM, New York, pp 536–543

26. Yang B, Garcia-Molina H (2002) Improving search in peer-to-peer networks. In: Proceedings of the 22nd international

conference on distributed computing systems. IEEE, Piscat-away, pp 5–14

27. Lv Q, Cao P, Cohen E, Li K, Shenker S (2002) Search andreplication in unstructured peer-to-peer networks. In: Pro-ceedings of the 16th international conference on supercom-puting. ACM, New York, pp 84–95

28. Morselli R, Bhattacharjee B, Srinivasan A, Marsh MA (2005)Efficient lookup on unstructured topologies. In: Proceedingsof 24th ACM symposium on principles of distributed comput-ing, pp 77–86

29. Leibowitz N, Ripeanu M, Wierzbicki A (2003) Deconstruct-ing the Kazaa network. In: Proceedings of the 3rd interna-tional workshop on internet applications. IEEE ComputerSociety, Piscataway, pp 112–120

30. Singla A, Rohrs C (2002) Ultrapeers: another step towardsgnutella scalability, version 1.0. Lime Wire LLC, Tech. Rep.

31. Guha S, Daswani N, Jain R (2006) An experimental study ofthe Skype peer-to-peer VoIP system. In: Proceedings of the5th international workshop on peer-to-peer systems, pp 1–6

32. Yang B, Garcia-Molina H (2003) Designing a super-peer net-work. In: Proceedings of the 19th international conferenceon data engineering. IEEE Computer Society, Bangalore,pp 49–60

33. Garcia-Molina H (1982) Elections in a distributed computingsystem. IEEE Trans Comput 31(1):48–59

34. van Renesse KPBR, Maffeis S (1996) Horus, a flexible groupcommunication system. Commun ACM 39(4):76–83

35. Xiao L, Zhuang Z, Liu Y (2005) Dynamic layer managementin superpeer architectures. IEEE Trans Parallel Distrib Syst16:1078–1091

36. Montresor A (2004) A robust protocol for building superpeeroverlay topologies. In: Proceedings of the 4th internationalconference on peer-to-peer computing. IEEE Computer So-ciety, Piscataway, pp 202–209

37. Jesi GP, Montresor A, Babaoglu Ö (2006) Proximity-awaresuperpeer overlay topologies. In: Keller A, Martin-Flatin J-P(eds) Proceedings of the 2nd IEEE international workshopon self-managed networks, systems, and services. Springer,New York, pp 43–57, LNCS 3996

38. Aggarwal CC, Yu PS (2006) A survey of synopsis construc-tion in data streams, ch. 9. Springer, New York

39. Nath S, Gibbons PB, Seshan S, Anderson ZR (2008) Synopsisdiffusion for robust aggregation in sensor networks. ACMTrans Sens Netw 4(2)

40. Arai B, Das G, Gunopulos D, Kalogeraki V (2007) Effi-cient approximate query processing in peer-to-peer networks.IEEE Trans Knowl Data Eng 19(7):919–933

41. Renesse RV, Birman KP, Vogels W (2003) Astrolabe: a ro-bust and scalable technology for distributed system monitor-ing, management, and data mining. ACM Trans Comput Syst21(2):164–206

42. Kempe D, Dobra A, Gehrke J (2003) Gossip-based compu-tation of aggregate information. In: Proceedings of the 44thIEEE symposium on foundations of computer science, pp482–491

43. Sacha J, Dowling J, Cunningham R, Meier R (2006) Us-ing aggregation for adaptive super-peer discovery on thegradient topology. In: Proceedings of the 2nd IEEE inter-national workshop on self-managed networks, systems &services (SelfMan). Springer, New York, pp 77–90, LNCS3996

44. Sacha J, Biskupski B, Dahlem D, Cunningham R, Dowling J,Meier R (2007) A service-oriented peer-to-peer architecturefor a digital ecosystem. In: Proceedings of the 1st IEEE inter-national conference on digital ecosystems and technologies.IEEE, Piscataway, pp 205–210


Jan Sacha is a postdoctoral researcher in the Computer SystemsGroup at Vrije Universiteit Amsterdam. He holds a Ph.D. degreefrom Trinity College Dublin and a M.Sc. degree from bothWarsaw University and Vrije Universiteit Amsterdam. His mainresearch interests include peer-to-peer systems, grid systems, andself-organising systems.

Bartosz Biskupski holds a Ph.D. in computer science from Trin-ity College Dublin in Ireland and M.Sc. in computer science fromVrije Universiteit Amsterdam in the Netherlands and WarsawUniversity in Poland. His research interests include peer-to-peersystems, media streaming and self-organisation in distributedsystems. He is currently starting up his own technology company.

Dominik Dahlem received a Diplom Engineer in ComputerScience from the University of Applied Sciences in Wiesbaden,Germany, and an M.Sc. by research from Trinity College Dublin,

Ireland. He is currently working on his Ph.D. at Trinity Col-lege Dublin. His research interests include multi-agent reinforce-ment learning, social network analysis, kriging metamodellingof computer experiments, and simulation technologies on high-performance computing infrastructures.

Raymond Cunningham is the founder of a startup company.Previously, he was a Research Fellow at the Department of Com-puter Science, Trinity College Dublin. He holds a B.A. degree inMathematics and M.Sc. and Ph.D. degrees in Computer Science,all from Trinity College Dublin. His research interests coveredthe area of mobile distributed systems, distributed systems opti-misation techniques and adaptive middleware.

René Meier is a lecturer in the School of Computer Scienceand Statistics at Trinty College Dublin. He holds Ph.D. andM.Sc. degrees from Trinity College Dublin. His research interestsinclude programming models and middleware for very large-scale, context-aware mobile and pervasive computing systems aswell as for self-organising (peer-to-peer) systems.


Jim Dowling received the B.A. and Ph.D. degrees in computerscience from Trinity College, Dublin, Ireland. He is a researcherat the Swedish Institute of Computer Science in Stockholm, anda former Marie Curie Intra-European scholar. He has managedboth national and EU research projects in Ireland and Sweden.His research interests are primarily in the areas of distributedsystems, autonomic computing, and middleware.

Mads Haahr is a Lecturer in Computer Science at Trinity CollegeDublin. He holds BSc and MSc degrees from the Universityof Copenhagen and a PhD from Trinity College Dublin. Heis Editor-in-Chief of Crossings: Electronic Journal of Art andTechnology and also built and operates RG. His current researchinterests are in large-scale self-organising distributed and mobilesystems, in sensor-augmented artefacts and in true random num-ber generation.

decentralising a service-oriented architecture › download › pdf › 81887123.pdf ·...

Documents