a survey of distributed data aggregation algorithms · data aggregation is an essential building...

28
1 A Survey of Distributed Data Aggregation Algorithms Paulo Jesus, Carlos Baquero and Paulo S´ ergio Almeida Abstract—Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, which can then be used to direct the execution of other applications. The resulting values are derived by the distributed computation of functions like COUNT, SUM and AVERAGE. Some application examples deal with the determination of the network size, total storage capacity, average load, majorities and many others. In the last decade, many different approaches have been proposed, with different trade-offs in terms of accuracy, reliability, message and time complexity. Due to the considerable amount and variety of aggregation algorithms, it can be difficult and time consuming to determine which techniques will be more appropriate to use in specific settings, justifying the existence of a survey to aid in this task. This work reviews the state of the art on distributed data aggregation algorithms, providing three main contributions. First, it formally defines the concept of aggregation, characterizing the different types of aggregation functions. Second, it succinctly describes the main aggregation techniques, organizing them in a taxonomy. Finally, it provides some guidelines toward the selection and use of the most relevant techniques, summarizing their principal characteristics. Index Terms—distributed algorithms, data aggregation, performance trade-offs, fault-tolerance 1 I NTRODUCTION Data aggregation is an essential building block of mod- ern distributed systems, enabling the determination of important system wide properties in a decentralized manner. The knowledge of these global properties can then be used as input by other distributed applications and algorithms. The network size is a common example of such global properties, which is required by many algorithms in the context of Peer-to-Peer (P2P) networks, for instance: in the construction and maintenance of Distributed Hash Tables (DHT) [1], [2]; when setting the fanout of a gossip protocol [3]. This size estimation is also used in many other contexts, for example: to set up a quorum in dynamic settings [4], to compute the mixing time of a random walk [5]. The network size is computed through the COUNT aggregation function. Nevertheless, other meaningful global properties can be computed using different func- tions, for example: AVERAGE can be applied to determine the average system load which can be used to direct local load balancing decisions; SUM allows the determination of total values such as the total free disk space avail- able in a file-sharing system. In the particular case of Wireless Sensor Networks (WSN) and for the Internet of Things (IOT) [?], data gathering is often only practical if data aggregation is performed, due to the strict energy constraints found on such environments. Even when energy is available, there is still need for energy efficient approaches [?]. The above examples intend to illustrate some of the main reasons that have motivated the research and P. Jesus, C.Baquero and P.S.Almeida are with HASLab / INESC TEC & Universidade do Minho, Braga, Portugal. development of distributed data aggregation approaches over the past years, but more can be found in the literature. Besides all the existing relevant application examples, aggregation has also been stated as one of basis for scalability in large-scale services [6], reinforcing its importance. Currently, a huge amount of distinct approaches constitute the body of related work on dis- tributed data aggregation algorithms, with all exhibiting different trade-offs in terms of accuracy, time, communi- cation and fault-tolerance. Together, existing techniques have confirmed that obtaining global statistics in a dis- tributed fashion is a difficult problem, especially when considering faults and network dynamism. Moreover, in face of such diversity, it becomes difficult to choose which distributed data aggregation algorithm should be preferred in a given scenario, and which one will best suit the requirements of a specific application. One of the main motivations of this work is precisely to help readers make this choice. Some surveys have been previously published [7], [8], [9], [10], [11] focusing specifically on aggregation techniques for WSN. Several in-network aggregation techniques for WSN are depicted in [7], which typically operate at the network level, needing to deal with the resource constraints of sensor nodes (limited computa- tional power, storage and energy resources). A review of the existing literature more focused on energy efficiency is presented in [8], and on security in [9], [10]. An- other work reviewing the state-of-the-art of information fusion techniques for WSN is also available [11]. In this later work, a broader view of the sensor fusion process is reviewed, from raw data collection, passing through a possible summarization (data aggregation) or compression, until the final resulting decision or action is reached. Data aggregation is considered a subset of

Upload: others

Post on 19-Sep-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

1

A Survey of Distributed Data AggregationAlgorithms

Paulo Jesus, Carlos Baquero and Paulo Sergio Almeida

Abstract—Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties,which can then be used to direct the execution of other applications. The resulting values are derived by the distributed computationof functions like COUNT, SUM and AVERAGE. Some application examples deal with the determination of the network size, total storagecapacity, average load, majorities and many others. In the last decade, many different approaches have been proposed, with differenttrade-offs in terms of accuracy, reliability, message and time complexity. Due to the considerable amount and variety of aggregationalgorithms, it can be difficult and time consuming to determine which techniques will be more appropriate to use in specific settings,justifying the existence of a survey to aid in this task. This work reviews the state of the art on distributed data aggregation algorithms,providing three main contributions. First, it formally defines the concept of aggregation, characterizing the different types of aggregationfunctions. Second, it succinctly describes the main aggregation techniques, organizing them in a taxonomy. Finally, it provides someguidelines toward the selection and use of the most relevant techniques, summarizing their principal characteristics.

Index Terms—distributed algorithms, data aggregation, performance trade-offs, fault-tolerance

F

1 INTRODUCTION

Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination ofimportant system wide properties in a decentralizedmanner. The knowledge of these global properties canthen be used as input by other distributed applicationsand algorithms. The network size is a common exampleof such global properties, which is required by manyalgorithms in the context of Peer-to-Peer (P2P) networks,for instance: in the construction and maintenance ofDistributed Hash Tables (DHT) [1], [2]; when setting thefanout of a gossip protocol [3]. This size estimation isalso used in many other contexts, for example: to set upa quorum in dynamic settings [4], to compute the mixingtime of a random walk [5].

The network size is computed through the COUNTaggregation function. Nevertheless, other meaningfulglobal properties can be computed using different func-tions, for example: AVERAGE can be applied to determinethe average system load which can be used to direct localload balancing decisions; SUM allows the determinationof total values such as the total free disk space avail-able in a file-sharing system. In the particular case ofWireless Sensor Networks (WSN) and for the Internet ofThings (IOT) [?], data gathering is often only practical ifdata aggregation is performed, due to the strict energyconstraints found on such environments. Even whenenergy is available, there is still need for energy efficientapproaches [?].

The above examples intend to illustrate some of themain reasons that have motivated the research and

• P. Jesus, C.Baquero and P.S.Almeida are with HASLab / INESC TEC &Universidade do Minho, Braga, Portugal.

development of distributed data aggregation approachesover the past years, but more can be found in theliterature. Besides all the existing relevant applicationexamples, aggregation has also been stated as one ofbasis for scalability in large-scale services [6], reinforcingits importance. Currently, a huge amount of distinctapproaches constitute the body of related work on dis-tributed data aggregation algorithms, with all exhibitingdifferent trade-offs in terms of accuracy, time, communi-cation and fault-tolerance. Together, existing techniqueshave confirmed that obtaining global statistics in a dis-tributed fashion is a difficult problem, especially whenconsidering faults and network dynamism. Moreover,in face of such diversity, it becomes difficult to choosewhich distributed data aggregation algorithm should bepreferred in a given scenario, and which one will bestsuit the requirements of a specific application. One ofthe main motivations of this work is precisely to helpreaders make this choice.

Some surveys have been previously published [7],[8], [9], [10], [11] focusing specifically on aggregationtechniques for WSN. Several in-network aggregationtechniques for WSN are depicted in [7], which typicallyoperate at the network level, needing to deal with theresource constraints of sensor nodes (limited computa-tional power, storage and energy resources). A review ofthe existing literature more focused on energy efficiencyis presented in [8], and on security in [9], [10]. An-other work reviewing the state-of-the-art of informationfusion techniques for WSN is also available [11]. Inthis later work, a broader view of the sensor fusionprocess is reviewed, from raw data collection, passingthrough a possible summarization (data aggregation) orcompression, until the final resulting decision or actionis reached. Data aggregation is considered a subset of

Page 2: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

2

information fusion, that aims at reducing the handleddata volume by establishing appropriate summaries.

In this survey, we intend to address data aggregationalgorithms at a higher abstraction level, providing acomprehensive and more generic view of the problemindependently from the type of network used. Unlikemost of the previous surveys, we do not focus onalgorithms for specific networks, like WSN. Instead, weprovide a wider view of the existing techniques to tacklespecifically the distributed data aggregation problem, asthe performance of the algorithms vary according to theunderlying network settings. Therefore, more alterna-tives are presented to the reader compared to previousworks, allowing the determination of which data aggre-gation algorithms might be more suited for any targetsetting. We define the problem of computing aggregationfunctions in a distributed fashion, and detail a widerange of distinct solutions. A taxonomy of the existingdistributed data aggregation algorithms is proposed,classifying them according to two main perspectives:communication and computation. The first viewpointrefers to the routing protocols and intrinsic networktopologies associated with each protocol, which are usedto support the aggregation process. The second per-spective points out the aggregation functions computedby the algorithms and the main principles from whichthey are based on. Other perspectives (e.g., algorithmrequirements, covered types of aggregation functions)could have been considered, since the mapping betweenthe algorithm attributes is multidimensional. However,we believe that the two chosen perspectives will pro-vide a clear presentation. We also give some importantguidelines to help in the decision of which distributedaggregation algorithm should be used, according to therequirements of a target application and environment.

The remaining of this survey is organized as follows.In Section 2, we clarify the concept of aggregation, anddefine the problem of its distributed computation. Ataxonomy of the existing distributed aggregation algo-rithms from the communication perspective is proposedin Section 3, describing the most relevant approachesand discussing their pros and cons. Section 4 proposesa taxonomy from the computation perspective. Section 5summarizes the properties of the most relevant ap-proaches, and gives some guidelines for their practicalapplication. Finally, some concluding remarks and futureresearch directions for the area are drawn in Section 6.

2 PROBLEM DEFINITION

In a nutshell, aggregation can be simply defined as “theability to summarize information”, quoting Robbert VanRenesse [6]. Data aggregation is considered a subset ofinformation fusion, aiming at reducing the handled datavolume [11]. Here, we provide a more precise definition,and consider that the process consists in the computationof an aggregation function defined by:

Definition 2.1 (Aggregation function): An aggregationfunction f takes a multiset1 of elements from a set Iand produces an output from a set O:

f : NI → O

The input being a multiset emphasizes that: 1) theorder in which the elements are aggregated is irrelevant;2) a given value may occur several times. Frequently, forcommon aggregation functions such as MIN, MAX, andSUM, both I and O are the same set. For others, such asCOUNT (which gives the cardinality of the multiset), theresult is an integer, regardless of the input set domain.

An aggregation function aims to summarize infor-mation. Therefore, the result of an aggregation (in theoutput set O) typically takes much less space thanthe multiset to be aggregated (an element from NI ).We will leave unspecified what is acceptable for somefunction to be considered as summarizing information,and therefore, an aggregation function. It can be saidthat an element of the output set O is not normallya multiset (we do not have normally O = NI ) andthat the identity function is clearly not an aggregationfunction as it definitely does not summarize information.In most practical cases, the size of the output is at mosta logarithm of the input size, and often even of constantsize.

2.1 Decomposable functionsFor some aggregation functions we may need to per-form a single computation involving all elements inthe multiset. For many cases, however, one needs toavoid such centralized computation. In order to performdistributed in-network aggregation, it is relevant whetherthe aggregation function can be decomposed into severalcomputations involving sub-multisets of the multiset tobe aggregated. For distributed aggregation it is useful,therefore, to define the notion of decomposable aggrega-tion function, and a subset of these, which we call self-decomposable aggregation functions.

Definition 2.2 (Self-decomposable aggregation function):An aggregation function f : NI → O is said to beself-decomposable if, for some (merge) operator � andall non-empty multisets X and Y :

f(X ] Y ) = f(X) � f(Y )

In the above, ] denotes the standard multiset sum (see,e.g.,[12]). According to the above definition, and giventhat the aggregation result is the same for all possiblepartitions of a multiset into sub-multisets, it means thatthe operator � is commutative and associative. Manytraditional functions such as MIN, MAX, SUM and COUNTare self-decomposable:

SUM ({x}) = x,

SUM(X ] Y ) = SUM(X) + SUM(Y ).

1. A generalization of set, in which elements are allowed to appearmore than once.

Page 3: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

3

COUNT({x}) = 1,

COUNT(X ] Y ) = COUNT(X) + COUNT(Y ).

MIN({x}) = x,

MIN(X ] Y ) = min(MIN(X),MIN(Y )).

Definition 2.3 (Decomposable aggregation function):An aggregation function f : NI → O is said to bedecomposable if for some function g and a self-decomposable aggregation function h, it can beexpressed as:

f = g ◦ h

From this definition, the self-decomposable functionsare a subset of the decomposable functions, where g =ID, the identity function. While for self-decomposablefunctions the intermediate results (e.g., for in-networkaggregation) are computed in the output set O, for ageneral decomposable function, we may need a differentauxiliary set to hold the intermediate results.

The classic example of a decomposable (but not self-decomposable) function, is AVERAGE, which gives theaverage of the elements in the multiset:

AVERAGE(X) = g(h(X)), withh({x}) = (x, 1),

h(X ] Y ) = h(X) + h(Y ),

g((s, c)) = s/c.

in which h is a self-decomposable aggregation functionthat outputs values of an auxiliary set, in this casepairs, and + is the standard pointwise sum of pairs(i.e., (x1, y1) + (x2, y2) = (x1 + x2, y1 + y2)). Anotherexample is the RANGE function in statistics, which givesthe difference between the maximum and the minimumvalue.

2.2 Duplicate sensitiveness and idempotenceDepending on the aggregation function, it may be rele-vant whether a given value occurs several times in themultiset. For some aggregation functions, such as MINand MAX, the presence of duplicate values in the multisetdoes not influence the result, which only depends on itssupport set (the set obtained by removing all duplicatesfrom the original multiset). E.g.,

MIN({1, 3, 1, 2, 4, 5, 4, 5}) = MIN({1, 3, 2, 4, 5}) = 1

For others, such as SUM and COUNT, the number oftimes each element occurs (its multiplicity) is relevant:

COUNT({1, 3, 1, 2, 4, 5, 4, 5}) = 8

6=COUNT({1, 3, 2, 4, 5}) = 5

Therefore, duplicate sensitiveness is important andmust be taken into account for distributed aggregation,in order to compute the correct result.

TABLE 1Taxonomy of aggregation functions

Decomposable Non-decomposable

Self-decomposable

DuplicateMIN, MAX RANGE DISTINCT COUNT

insensitiveDuplicate

SUM, COUNT AVERAGE MEDIAN, MODEsensitive

Definition 2.4 (Duplicate insensitive aggregation function):An aggregation function f is said to be duplicateinsensitive if for all multisets M , f(M) = f(S), where Sis the support set of M .

Moreover, some duplicate insensitive functions (likeMIN and MAX) can be implemented using an idempo-tent binary operator that can be successively applied(by intermediate nodes) on the elements of the multi-set (any number of times without affecting the result).This helps in obtaining fault tolerance and decentralizedprocessing, allowing retransmissions or sending valuesacross multiple paths (without affecting the aggregationresult). Unfortunately, the distributed application of anidempotent operator is not always possible, even forsome duplicate insensitive aggregation functions, suchas DISTINCT COUNT (i.e., cardinality of the support set).In practice, the application of an idempotent operator ina distributed way to compute an aggregation functionappears to be only possible if the function is duplicateinsensitive and self-decomposable.

2.3 Taxonomy of common aggregation functionsBuilding on the concepts of decomposability and du-plicate sensitiveness, we can obtain a taxonomy of ag-gregation functions (Table 1). It contains the standardaggregation functions, under the usual statistical defini-tion. In addition to those already mentioned, it containsMEDIAN (the value separating the higher half from thelower half, after ordering the values), and MODE (thevalue that appears most often).

This table helps to clarify how suited an aggrega-tion function is to a distributed computation. Non-decomposable aggregation functions are harder to com-pute than decomposable ones, being vulgarly labeledin the literature as “complex”. Commonly, duplicateinsensitive aggregation functions are easier to computethan duplicate sensitive. As we will see in the nextsection, one way to obtain fault-tolerance (at the cost ofsome accuracy) is to use duplicate insensitive approachesto estimate some aggregation function (e.g., sketches,see section 4.3), replacing the use of non-idempotentoperations (like SUM) by idempotent ones (like MAX).

This simple classification of common aggregationfunctions is orthogonal to the taxonomies provided inthe next two sections about the distributed algorithmsthat can be used to compute them.

Page 4: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

4

3 COMMUNICATION TAXONOMY

Three major classes of aggregation algorithms are iden-tified from the communication perspective, accordingto the characteristics of their communication/routingstrategy and network topology (see Table 2): structured(usually, hierarchy-based), unstructured (usually, gossip-based), and hybrid (mixing the previous categories).

The structured communication class refers to aggrega-tion algorithms that are dependent from a specific net-work topology and routing scheme to operate correctly.If the required routing topology is not available, thenan additional preprocessing phase is needed in order tocreate it, before starting the execution of the algorithm.This dependency limits the use of these techniques indynamic environments. For instance, in mobile networksthese algorithms need to be able to continuously adapttheir routing structure to follow network changes. Typi-cally, these algorithms are directly affected by problemstied to the used routing structure. For example, in tree-based communication structures a single point of failure(node/link) can compromise the delivery of data fromall its subtrees, and consequently impair the applicationssupported by that structure. In practice, due to theirenergy efficiency, hierarchical communication structures(e.g., tree routing topology) are the most often used toperform data aggregation, especially in WSN. Alterna-tive routing topologies are also considered, like the ringtopology, although very few approaches rely on it.

The unstructured communication category covers ag-gregation algorithms that can operate independentlyfrom the network organization and structure, with-out establishing any predefined topology. In termsof communication, this kind of algorithms is essen-tially characterized by the used communication pat-tern: flooding/broadcast, random walk and gossip. The flood-ing/broadcast communication pattern is associated withthe dissemination of data from one node to all thenetwork nodes or to a group of nodes – “one to all”.A random walk consists in sequential message transmis-sions, from one node to another – “one to one”. Thegossip communication pattern refers to a well-knowncommunication protocol, based on the spreading of arumor [13] (or an epidemic disease), in which messagesare sent successively from one node to a selected numberof peers – “one to many”. In the recent years, severalaggregation algorithms based on gossip communicationhave been proposed, leveraging properties of simplicity,scalability and robustness. More details about these dif-ferent communication patterns (see Table 2), applied todata aggregation, will be further described.

The hybrid class groups algorithms that mix the useof different routing strategies from the previous cate-gories, with the objective of combining good propertiesand overcoming weakness, while deriving equivalent orimproved aggregations (more details in Section 3.6).

TABLE 2Taxonomy from a communication perspective

Routing Algorithms

Structured

Hierarchy(tree, cluster,multipath)

TAG [14], DAG [15],I-LEAG [16],Sketches [17],RIA-LC/DC [18], [19],Tributary-Delta [20],Q-Digest [21]

Ring(Horowitz andMalkhi, 2003) [22]

Unstructured

Flooding/Broadcast Randomized Reports [23]

Random walk

Random Tour [24],(Dolev et al., 2002) [25], [26],Sample & Collide [27], [24],Capture-Recapture [28]

Gossip

Push-Sum Protocol [29],Push-Pull Gossiping [30],DRG [31],Flow Updating [32], [33],Extrema Propagation [34]Equi-Depth[35],Adam2 [36],Hop-Sampling [37], [38],Interval Density [37], [38]

Hybrid Hierarchy + Gossip (Chitnis et al., 2008) [39]

3.1 Hierarchy-based approachesTraditionally, existing aggregation algorithms operate ona hierarchy-based communication scheme. Hierarchy-based approaches are often used to perform data ag-gregations, especially in WSN. This routing strategyconsists in the definition of a hierarchical communicationstructure (e.g., spanning tree), rooted at a single point,commonly designated as sink. In general, in a hierarchy-based approach the data is simply disseminated fromlevel to level, upwards in the hierarchy, in response toa query request made by the sink, which computes thefinal result. Besides the sink, other special nodes can bedefined to compute intermediate aggregates, working asaggregation points that forward their results to upperlevel nodes until the sink is reached.

Aggregation algorithms based on hierarchic commu-nication usually work in two phases, involving theparticipation of all nodes in each one: request phaseand response phase. The request phase corresponds tothe spreading of an aggregation request throughout thewhole network. Several considerations must be takeninto account before starting this phase, depending onwhich node wants to perform the request and on theexisting routing topology. For instance: if the routingstructure has not been established yet, it must be createdand ought to be rooted at the requesting node; if therequired topology is already established, first the node

Page 5: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

5

must forward its request to the root, in order to bespread (from the sink) across all the network. Duringthe response phase, all the nodes answer the aggregationquery by sending the requested data toward the sink.In this phase, nodes can be asked to simply forwardthe received data or to compute partial intermediateaggregates to be sent.

The aggregation structure of hierarchy-based ap-proaches provides a simple strategy that enables theexact computation of aggregates (if no node or commu-nication failures happen), in an efficient manner in termsof energy consumption (i.e., message load). However,in adverse environments this type of approach exhibitssome fragility in terms of robustness, since a single pointof failure can jeopardize the obtained result. Further-more, to correctly operate in dynamic environments,where the network continuously changes, with nodesjoining and leaving, extra procedures are required tomaintain an updated routing structure.

3.1.1 TAGThe Tiny AGgregation service for ad-hoc sensor net-works described by Madden et al. [14] represents aclassical tree-based in-network aggregation approach. Asreferred by the authors, TAG is agnostic to the imple-mentation of the tree-based routing protocol, as far asit satisfies two important requirements. First, it must beable to deliver query requests to all the network nodes.Second, it must provide at least one route from everynode (that participates in the aggregation process) tothe sink, guaranteeing that no duplicates are generated(at most one copy of every message must arrive). Thisalgorithm requires the previous creation of a tree-basedrouting topology, and also the continuous maintenanceof such routing structure in order to operate over mobilenetworks.

TAG allows the computation of basic database aggre-gation functions, such as COUNT, MAXIMUM, MINIMUM,SUM and AVERAGE (providing a declarative SQL-likequery language with grouping capabilities). The algo-rithm performs a classic hierarchy-based aggregationprocess which consists of two phases: a distribution phase(in which the aggregation query is propagated along thetree routing topology, from the root to the leaves) anda collection phase (where the values are aggregated fromthe children to the parents, until the root is reached).The derivation of the aggregation result at the rootincurs a minimal time overhead that is proportional tothe tree depth. This waiting time is needed to ensurethe conclusion of the two execution phases and theparticipation of all nodes in the aggregation process.

Figure 1 show an example of the execution of TAGto compute the AVERAGE. In the particular case of theAVERAGE the sum and count need to be aggregated atall intermediate nodes in order to compute the globalaverage at the sink2. In this example, it is considered

2. Recall that AVERAGE is a decomposable aggregation function, butnot self-decomposable (see Section 2.1).

t = 1 t = 2

t = 3 t = 4

t = 01

3 2

1

k

i

j

l2

m n3

1

3 2

1

k

i

j

l2

m n3

1

3 2

1

k

i

j

l2

m n3

1

3 2

1

k

i

j

l2

m n3

1

3 2

1

k

i

j

l2

m n3

(5, 2)

(3, 1)(2, 1)(1, 1)

(6, 3)

avg avg

avgavgavg

avg(12, 6) = 12 / 6 = 2

Fig. 1. Execution example of TAG to compute the AVER-AGE function.

that a tree routing topology rooted at node i is alreadyavailable. Then, at time t = 1 node i initiate the distribu-tion phase by sending an average request to its children(nodes j and k) which at their turn will also propagatethe query to all their children. At time t = 3, upon thereception of the average query, the leaves initiate thecollection phase, reporting to their parents the value oftheir local reading for the sum and the value 1 for thecount, in a pair (sum, count). E.g., node l sends (1, 1) andnode m sends (2, 1) to node j. Then, the intermediatenodes j and k aggregate the values received from theirchildren with their own and send the result to theirown parent. E.g., node j aggregates the received values(1, 1) and (2, 1) with its own (3, 1) and sends the result(6, 3) = (1+2+3, 1+1+1) to i. Finally, the sink aggregatethe values received from all its children with its own(6 + 5 + 1, 3 + 2 + 1) = (12, 6) and computes the globalaverage result 12/6 = 2.

A pipelined aggregate technique (detailed in [40]) hasbeen proposed to minimize the effect of the waitingtime overhead. According to this technique, smaller timeintervals (relatively to the overall needed time) are usedto repetitively produce periodic (partial) aggregationresults. In each time interval, all nodes that have receivedthe aggregation request will transmit a partial result.

Page 6: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

6

The sent value is calculated from the application of theaggregation function to the local reading and the resultsreceived from children in the previous interval. Overtime, after each successive time interval, the aggregatedvalue will result from the participation of a growingnumber of nodes, increasing the reliability and accuracyof the result, becoming close to the correct value at eachstep. The correct aggregation result should be reachedafter a minimum number of iterations (in an ideal fail-safe environment).

Following the authors concerns about power con-sumption, additional optimization techniques were pro-posed to the TAG basic approach, in order to reducethe number of messages sent, taking advantage of theshared communication medium in wireless networks(which enables message snooping and broadcast) andgiving decision power to nodes. They proposed a tech-nique called hypothesis testing, to use in certain classes ofaggregates, where each node can decide to transmit thevalue resulting from its subtree, only if it will contributeto the final result.

3.1.2 DAGAn aggregation scheme for WSN based on the creationof a Directed Acyclic Graph (DAG) is proposed in [15],with the objective of reducing the effect of message lossin common tree-based approaches, by allowing nodes touse alternative parents. The DAG is created by settingmultiple parents (within radio range) to each node, asits next hop toward the sink.

In more detail, request messages are extended witha list of parent nodes (IDs), enabling children to learnthe grandparents that are two hops away. In orderto avoid duplicated aggregates, only a grandparent ischosen to aggregate intermediate values, preferably acommon grandparent of its parents. The most commongrandparent among the lists received from parents ischosen as the destination aggregator. Otherwise, oneof the grandparents is chosen (e.g., when a node hasonly one parent node). Response messages are handledaccording to specific rules to avoid duplicate processing:they can be aggregated, forwarded or discarded. Mes-sages are aggregated if the receiving node corresponds tothe destination, forwarded if the destination is a node’sparent, and otherwise discarded (destination is not thenode or one of its parents). Note that, although the samemessage can be duplicated and multiple “copies” canreach the same node (a grandparent), they will havethe same destination node and only one from the samesource will be considered for aggregation after receivingall messages from children.

This method takes advantage of the path redundancyintroduced by the use of multiple parents to improvethe robustness of the aggregation scheme (tolerance tomessage loss), when compared to traditional tree-basedtechniques. Though a better accuracy can be achieved, itcomes at the cost of higher energy consumption, as moremessages, and with an increased size, are transmitted.

Note that, this approach does not fully overcome themessage loss problem of tree routing topologies, as somenodes may have a single parent, being dependent fromthe quality of the created DAG.

3.1.3 SketchesAn alternative multi-path based approach is proposedin [17] to perform in-network aggregation for sensordatabases, using small sketches. The defined schemeis able to deal with duplicated data upon multi-pathrouting and compute duplicate-sensitive aggregates, likeCOUNT, SUM and AVERAGE. This algorithm is based onthe probabilistic counting sketches technique introducedby Flajolet and Martin [41] (FM), used to estimate thenumber of distinct elements in a data collection. Ageneralization of this technique is proposed to be ap-plied to duplicate-sensitive aggregation functions (non-idempotent), namely the SUM. Sketches apply a stochas-tic transformation to these functions, allowing duplicateinsensitive aggregation over a transformed data repre-sentation, at the cost of less accuracy.

The authors consider the use of multi-path routing tosupport communication failures (links and nodes), pro-viding several possible paths to reach a destination. Likecommon tree-based approaches, the algorithm consistsof two phases: first, the sink propagates the aggregationrequest across the whole network; second, the localvalues are collected and aggregated along a multi-pathtree from the children to the root. In this particularcase, during the request propagation phase, all nodescompute their distance (level) to the root and store thelevel of their neighbors, establishing a hierarchical multi-path routing topology (similar to the creation of multiplerouting trees). In the second phase, partial aggregatesare computed across the routing structure, using theadapted counting sketch scheme, and sent to the upperlevels in successive rounds. Each round corresponds toa hierarchy level, in which the received sketches fromchild nodes are combined with the local one, until thesink is reached. In the last round, the sink merges thesketches of its neighbors and produces the final result,applying an estimation function over the sketch.

Notice that, the use of an auxiliary structure to sum-marize all data values (FM sketches), and correspondingestimator will introduce an approximation error that willbe reflected in the final result. However, according tothis aggregation scheme, it is expected that data loss(mitigated with the introduction of multiple alternativepaths) will have a higher impact in the result accuracythan the approximation error introduced by the use ofsketches.

3.1.4 I-LEAGThis cluster-based aggregation approach, designatedas Instance-Local Efficient Aggregation on Graphs (I-LEAG) [16], requires the pre-construction of a differentrouting structure – Local Partition Hierarchy, which can beviewed as a logical tree of local routing partitions. The

Page 7: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

7

routing structure is composed by a hierarchy of clus-ters (partitions), with upper level partitions comprisinglower level ones. A single pivot is assigned to each parti-tion, and the root of the tree corresponds to the pivot ofthe highest level cluster (that includes all the networkgraph). This algorithm emphasizes local computationto perform aggregation, executed in sequential phases.Each phase is associated to a level of the hierarchy, wherethe algorithm is executed in parallel by all partitions ofthe corresponding level (from lower to upper levels).

Basically, the algorithm proceeds as follow: each parti-tion checks for local conflicts (different aggregation out-puts between neighbors); detected conflicts are reportedto pivots, which compute the new aggregated value andmulticast the result to the partition; additionally, everynode forwards the received result to all neighbors thatdo not belong to the partition; received values are usedto update the local aggregation value (if received froma node in the current partition) or to update neighboraggregation output (if received from a neighbor of theupper level partition), enabling the local detection offurther conflicts. Conflicts are only detected betweenneighbors that belong to a different partition in theprevious phase, with different aggregation outputs fromthose partitions. A timer is needed to ensure that allmessages sent during some phase reach their destinationby the end of the same phase.

Furthermore, two extensions of the algorithm wereproposed to continuously compute aggregates over afixed network where node inputs may change over time:MultI-LEAG and DynI-LEAG [42]. MultI-LEAG mainlycorresponds to consecutive executions of I-LEAG, im-proved to avoid sending messages when no inputchanges are detected. Inputs are sampled at regular timeintervals and the result of the current sampling inter-val is produced before the next one starts. DynI-LEAGconcurrently executes several instances of MultI-LEAG,pipelining its phases (ensuring that every partition levelonly execute a single MultI-LEAG phase at a time), andmore frequently sampling inputs to faster tracking ofchanges, at the cost of a higher message complexity.Despite the authors’ effort to efficiently perform aggre-gation, these algorithms are restricted to static networks,with fixed size, without considering the occurrence offaults.

3.1.5 Tributary-DeltaThis approach mixes the use of tree and multi-pathrouting schemes to perform data aggregation, combiningthe advantages of both towards provide a better accuracyin the presence of communication failures [20]. Two dif-ferent routing regions are defined: tributary (tree routing,in analogy to the shape formed by rivers flowing into amain stream) and delta (multi-path routing, in analogyto the landform of a river flowing into the sea). Theidea is to use tributaries in regions with low messageloss rates to take advantage of the energy-efficiency andaccuracy of a traditional tree-based aggregation scheme,

and use deltas in zones where message loss has a higherrate and impact (e.g., close to the sink where messagescarry values corresponding to several nodes readings) tobenefit from the multi-path redundancy of sketch basedschemes.

Two adaptation strategies (TD-Coarse and TD) areproposed to shrink or expand the delta region, accordingto the network conditions and a minimum percentage ofcontributing nodes predefined by the user. Prior knowl-edge of the network size is required, and the numberof contributing nodes needs to be counted, in order toestimate the current participation percentage. Conver-sion functions are also required to convert partial resultsfrom the tributary into valid inputs to be used in thedelta region, by the multi-path algorithm. Experimentalresults applying TAG [14] in tributaries and SynopsesDiffusion [43] (see Section 4.3) in deltas showed that thishybrid approach performs better when compared to bothaggregation algorithms when used separately.

3.1.6 Other approachesSeveral other hierarchy-based aggregation approachescan be found in the literature, most of them differingon the supporting routing structure, or on the way itis built. Beside alternative variations of the hierarchicrouting topology, some optimization techniques to theaggregation process can also be found, especially toreduce the energy-consumption in WSN.

In [44] an aggregation scheme over Distributed HashTables (DHT) is proposed. This approach is character-ized by its tree construction protocol, which uses aparental function to map a unique parent to each node,building an aggregation tree in a bottom-up fashion,unlike traditional approaches. The authors consider thecoexistence of multiple trees to increase the robustnessof the algorithm against faults, as well as the continuousexecution of a tree maintenance protocol to handle thedynamic arrival and departure of nodes. Two operationmodes are proposed to perform data aggregation (anddata broadcast): default and on-demand mode. In thedefault mode, the algorithm is executed in background,taking advantage of messages exchanged by the treemaintenance protocol and appending some additionalinformation to these messages. The on-demand modecorresponds to the traditional aggregation scheme foundon tree-based algorithms.

Zhao et al. [45] proposed an approach to continuouslycompute aggregates in WSN, for monitoring purposes.They assume that the network continuously computesseveral aggregates, from which at least one correspondsto the min/max – computed using a simple diffusionscheme. A tree is implicitly constructed during the dif-fusion process, with the node with the min/max valueset as root, and is used for the computation of otheraggregates (e.g., average and count). In practice, twodifferent schemes are used: a digest diffusion algorithmto compute idempotent aggregates which is used toconstruct an aggregation tree, and a tree digest scheme

Page 8: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

8

similar to common hierarchy-based approaches that op-erates over the tree routing structure created by theprevious technique.

Alternative hierarchic routing structures are found inthe literature to support aggregation, namely: a BreadthFirst Search (BFS) tree is used in the Generic AggregationProtocol (GAP) [46] protocol to continuously computeaggregates for network management purposes; the cre-ation of a Group-Independent Spanning Tree (GIST)based on the geographic distribution of sensors is de-scribed in [47], taking into consideration the variationof the group of sensors that may answer an aggregationquery. A previous group-aware optimization techniquehas been proposed: Group-Aware Network Configu-ration (GaNC) [48]. GaNC influences the routing treeconstruction by enabling nodes to preferably set parentsfrom the same group (analyzing the GROUP BY clauseof the received aggregation queries) and according toa maximum communication range, in order to decreasemessage size and consequently reduce energy consump-tion.

Some algorithms [49], [50], [51] based on swarm intel-ligence techniques, more precisely ant colony optimiza-tion, can also be found in the literature to construct opti-mal aggregation trees, once more to improve the energyefficiency of WSN. Ant colony optimization algorithmsare inspired in the foraging behavior of ants, leavingpheromone trails that enable others to find the shortestpath to food. In this kind of approach, the aggrega-tion structure is iteratively constructed by artificial antagents, consisting in the paths (from different sources tothe sink) with the higher pheromone values, and wherenetwork nodes that belong to more than one path act asaggregation points.

Some studies [52], [53] have shown that decidingwhich node should act as a data aggregator or forwarderhas an important impact on the energy-consumptionand lifetime of WSN. A routing algorithm, designatedAdaptive Fusion Steiner Tree (AFST), that adaptivelydecides which nodes should fuse (aggregate) data orsimply forward it is described in [52]. AFST evaluatesthe cost of data fusion and transmission, during theconstruction of the routing structure in order to mini-mize energy consumption of data gathering. A furtherextension to this scheme was proposed to handle nodearrival/departure, Online AFST [54], with the objectiveof minimizing the cost and impact of dynamism in therouting structure. In Low-Energy Adaptive ClusteringHierarchy (LEACH) [53], [55], a cluster-based routingprotocol for data gathering in WSN, the random rota-tion of cluster-heads over time is proposed in order todistribute the energy consumption burden of collectingand fusing cluster’s data.

Filtering strategies can also be applied to reduceenergy consumption in aggregation. For instance, A-GAP [56] is an extension of GAP (previously referred)that uses filters to provide a controllable accuracy in theprotocol. Local filters are added at each node in order

to control whether or not an update is sent. Updatesare discard according to a predefined accuracy objective,resulting in a reduction in terms of communicationoverhead (number of messages). Filters can dynamicallyadjust along the execution of the protocol, allowing thecontrol of the trade-off between accuracy and overhead.Another similar approach to reduce message transmis-sions, under an allowable error value, is proposed in[57] and uses adaptively adjusting filters according toa Potential Gains Adjustment (PGA) strategy. A frame-work called Temporal coherency-aware in-Network Ag-gregation (TiNA) that filters reported sensor readingsaccording to their temporal coherency was proposed in[48]. This framework operates on the top of existingin-aggregation schemes (e.g., TAG). In particular, TiNAdefines an additional TOLERANCE clause to allow usersto specify the desired temporal coherency tolerance ofeach aggregation query, and filter the reported sensordata (i.e., readings within the range of the specified valueare suppressed).

3.2 Ring based approaches

Very few aggregation approaches are supported by aring communication structure. This particular type ofrouting topology is typically out-competed by hierar-chical approaches. For instance, the effect of failuresin a ring can be worst than on hierarchical topologies,as a single point of failure can break the whole com-munication chain. Furthermore, the time complexity ofrings when propagating data across the network is typi-cally much higher, leading to slower data disseminationspeed. However, this kind of topology can be exploredin alternative ways that can in some sense circumventthe aforementioned limitations.

It is worth referring to an alternative approach de-scribed by Horowitz and Malkhi [22], based on thecreation of a virtual ring to obtain an estimation of thenetwork size (i.e., COUNT) at each node. This techniquerelies solely on the departure and arrival of nodes to esti-mate the network size, without requiring any additionalcommunication. Each node of the network holds a singlesuccessor link, forming a virtual ring. It is assumed thateach node possesses an accurate estimator. Upon thearrival of a new node, a random successor among theexisting nodes, named contact point, is assigned to it.During the joining process, the new node gets the contactpoint estimator and increments it (by one). At the endof the joining process, the two nodes (joining node andcontact point) will yield the new count estimate. Upon thedetection of a departure, the inverse process is executed.This method provides a disperse estimate over the wholenetwork, with an expected accuracy that ranges fromn/2 to n2, where n represent the correct network size.In the rest of this paper, we will always denote n asthe network size, unless explicitly indicated otherwise.Despite the achieved low accuracy and considerableresult dispersion, this algorithm has a substantially low

Page 9: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

9

communication cost (i.e., communicates only upon ar-rival/departure, without any further information dis-semination; each joining node communicates only withtwo nodes).

3.3 Flooding/Broadcast based approachesFlooding/Broadcast based approaches promote the par-ticipation of all network nodes in the data aggregationprocess. The information is propagated from a singlenode (usually a special one) to the whole network,sending messages to all neighbors – “one to all”. Thiscommunication pattern normally induces a high net-work load, during the aggregation process, implying insome cases a certain degree of centralization of dataexchanges. Tree-based approaches are traditional usageexamples of this communication pattern, but in thiscase it is supported by a hierarchical routing topology.Additional examples, which are not sustained by anyspecific structured routing topology are described below.

3.3.1 Randomized ReportsA naive algorithm to perform aggregation consists inbroadcasting a request to the whole network (indepen-dently from the existing routing topology), collecting thevalues from all nodes and computing the result at thestarting node. This likely leads to network congestionand an expected overload of the source node, due tofeedback implosion. However, a predefined response prob-ability could be used to mitigate this drawback, such thatnetwork nodes only decide to respond according to thedefined probability. Such probabilistic polling method wasproposed in [23] to estimate the network size. The sourcenode broadcasts a query request with a sampling prob-ability p, that is used by all remaining nodes to decidewhether to reply or not. All the received responses arecounted by the querying node (during a predefined timeinterval), knowing that it will receive a total number ofreplies r according to the given probability. At the end,the network size n can be estimated at the source byn = r/p.

3.3.2 Other ApproachesA similar approach based on the same principle (sam-pling probability) is proposed in [58], to approximatethe size of a single-hop radio network, considering theoccurrence of collisions. In each step, a transmissionsucceeds if exactly a single station chooses to send amessage. Setting a probability p to decide if to senda message at each node, the expression np(1 − p)n−1

gives the probability ps of a step being successful. Theprevious expression is maximized for p = 1/n whereps ≈ 1/e. It is expected that approximately t/e stepswill be successful, if the experiment is repeated inde-pendently t times. Based on the previous probabilisticobservation, the algorithm counts the number of suc-cessful steps along successive phases, to estimate thenetwork size. Different probability values (decreasing)

and number of trials (increasing) are used along eachconsecutive phase, until the number of successful steps isclose to the expected value. Further improvements to thisalgorithm have been proposed in [59], aiming at makingit immune against adversary attacks.

3.4 Random Walk based approachesRandom walk based approaches are usually associatedwith a data sampling process towards estimating anaggregation value, involving only a partial amount ofnetwork nodes. Basically, this communication processconsists in the random circulation of a token. A messageis sequentially sent from one node to another randomlyselected neighbor – “one to one”, until a predefinedstopping criteria is met (e.g., maximum number of hops,reach a selected node or return to the initial one). per-forming a random walk across the network. Usually, asmall amount of messages are exchanged in this kindof approach, since only a portion of the network isinvolved in the aggregation process. Due to the par-tial participation of the network, algorithms using thiscommunication pattern normally rely on probabilisticmethods to produce an approximation of the target ag-gregation function. Probabilistic methods provide resultestimates with a known, bounded, error. If the execu-tion conditions and the considered parameters of thealgorithm are stable, the estimation error is expected tobe maintained (with constant bounds) over time. Thiskind of aggregation algorithms will always depict anestimation error and do not tightly converge to thecorrect aggregate value.

3.4.1 Random TourThe random tour approach [24] is based on the executionof a random walk to estimate a sum of functions of thenetwork nodes, Φ =

∑i∈N φ(i), for a generic function

φ(i) where i denotes a node and N the set of nodes(e.g., to estimate the network size, count: φ(i) = 1,for all i ∈ N ). The estimate is computed from theaccumulation of local statistics into an initial message,all of which are gathered during a random walk, startingat an initiator node and proceeding until the messagereturns to it. The initiator node i initializes a counter Xwith the value φ(i)/di (where di denotes the degree ofnode i, i.e. number of adjacent nodes). Upon messagereception, each node j increments the message counterX by φ(j)/dj (i.e., X ← X + φ(j)/dj). In each iteration,the message tagged with the counter is updated andforwarded to a neighbor, chosen uniformly at random,until it returns to the initial node. When the originatorreceives back the message originally sent in the randomwalk, it computes the estimate Φ (of the sum Φ) byΦ = diX .

3.4.2 Other approachesOther approaches based on random walks can be foundin the literature, but they are commonly tailored for

Page 10: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

10

specific settings and to the computation of specific ag-gregation functions, like COUNT (to estimate the networkor group size).

For instance, to accelerate self-stabilization in a groupcommunication system for ad-hoc networks, a schemeto estimate the group size based on random walks isproposed in [26] (first published in [25]). In this specificcase, a mobile agent (called scouter) performs a randomwalk and collects information about live nodes to es-timate the system size. The agent carries the set of allvisited nodes and a counter associated with each oneof them. Whenever the agent moves to a node, all thecounters are incremented by one except for the one ofthe current node, which is set to zero. Large countervalues are associated with nodes that have been lessrecently visited by the scouter, becoming more likelyto be suspected of nonexistence. Counters are boundedby the scouter’s maximum number of moves, which isset according to the expected cover time and a safetyfunction, before considering a corresponding node as notconnected. In particular, nodes are sorted in increasingorder of their counter value, and they are removed fromthe scouter if the gap between successive nodes (kth andk−1th) is greater than the number of moves required toexplore k connected elements in a random walk fashion.After having the scouter perform a large enough numberof moves, the number of nodes in the system can beestimated by simply counting the number of elementskept in the set of visited nodes.

Other relevant approaches are based on the execu-tion of random walks to collect samples, like Sample& Collide [27], [24] and Capture-Recapture [28], they aredescribed in Section 4.5.

3.5 Gossip-based approaches

Commonly, gossip and epidemic communication areindistinctly referred. However, in a relatively recent re-view of gossiping in distributed systems [60] a slightdistinction between the two is made. In a nutshell, thedifference simply relies on the interaction directionalityof both protocols. The authors state that gossiping isreferred to “the probabilistic exchange of informationbetween two members”, and epidemic is referred to“information dissemination where a node randomlychooses another member”. Even so, the effect of bothprotocols in terms of information spread is much alike,and strongly related to epidemics. For this reason, in thiswork no distinction will be made between gossip andepidemic protocols.

Gossip communication protocols are strongly relatedto epidemics, where an initial node (“infected”) sends amessage to a (random) subset of its neighbors (“to becontaminated”), which repeat this propagation process– “one to many”. With the right parameters, almost thewhole network will end up participating in this prop-agation scheme. This communication pattern exhibitsinteresting characteristics despite its simplicity, allowing

a robust (fault tolerant) and scalable information dissem-ination all over the network, in a completely decentral-ized fashion. Nevertheless, it is important to outline thatthe robustness of gossip protocols may not be directlyattained by any algorithm based on a simple applicationof this communication pattern. For instance, the algo-rithm correctness may rely on principles and invariantsthat may not be guaranteed by a straightforward andincautious use of a gossip communication protocol, asrevealed in [61]. In general, gossip communication tendsto be as efficient as flooding, in terms of speed andcoverage, nonetheless it imposes a lower network trafficload. In the context of aggregation, another differencebetween gossip and flooding is that gossip is usually ap-plied on a continuous basis, while flooding is issued onetime to request an aggregation result. We consider thatalgorithms using nearest neighbor/local communication(where each node communicates with all of its localneighbors) are included in the gossip-based category.

3.5.1 Push-Sum ProtocolThe push-sum protocol [29] is a simple gossip-basedprotocol to compute aggregation functions, such as SUMor AVERAGE, consisting in an iterative pairwise distribu-tion of values throughout the whole network. In moredetail, along discrete times t, each node i maintains andpropagates information on a pair of values (sti, wti):where sti represents the sum of the exchanged values,and wti denotes the weight associated with this sum atthe given time t and node i.

In order to compute different aggregation functions,it is enough to assign appropriate initial values to thesevariables. E.g., considering vi as the initial input value atnode i, AVERAGE: s0i = vi and w0i = 1 for all nodes; SUM:s0i = vi for all nodes, only one node starts with w0i = 1and the remaining nodes assume w0i = 0; COUNT: s0i =1 for all nodes, only one with w0i = 1 and the otherswith w0i = 0.

In each iteration, a neighbor is chosen uniformly atrandom, and half of the actual values are sent to thetarget node and the other half to the node itself. Uponreceive, the local values are updated, adding each re-ceived value from a pair to its current correspondent (i.e.,pointwise sum of pairs). The estimate of the aggregationfunction can be computed by all nodes, and is given ateach time t by sti/wti.

Figure 2 illustrates the execution of this algorithm atnodes i and j to compute the AVERAGE function. Initially(t = 0), the initial input values are vi = 1 and vj = 3,the sum of exchanged values are s0i = vi = 1 and s0j =vj = 3, and the weights are set to one at all nodes w0i =w0j = 1. In this particular execution, it is consideredthat node i randomly chooses node j and j picks anotherarbitrary node. Then, half of the nodes current values, i.e.(s0i/2, w0i/2) and (s0j/2, w0j/2), are sent to the chosentarget and themselves. Upon reception of the messages,the state at each node is updated with the sum of thereceived values, respectively s1i = 0.5 and w1i = 0.5 for

Page 11: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

11

(0.5, 0.5)

∑= 4t = 0

1 3

∑= 4t = 1

1 2

(0.5, 0.5)

i j i j

(1.5, 0.5)

(1.5, 0.5)

Fig. 2. Execution example of the Push-Sum Protocol.

node i, and s1j = 0.5+1.5 = 2 and w1j = 0.5+0.5 = 1 fornode j. At the end of the depicted example, the estimateof the AVERAGE will be 1 (0.5/0.5) for node i and 2 (2/1)for j.

The accuracy of the produced result will tend toincrease progressively along each iteration, convergingto the correct value. As referred by the authors, thecorrectness of this algorithm relies on a fundamentalproperty designated as mass conservation, stating that: theglobal sum of all network values (local value of eachnode plus the value in messages in transit) must remainconstant over time. Considering the crucial importanceof this property, the authors assume the existence of afault detection mechanism that allows nodes to detectwhen a message did not reach its destination. In this situ-ation, the “mass” is restored by sending the undeliveredmessage to the node itself. This algorithm is further gen-eralized by the authors in the same work – push-synopsesprotocol, in order to combine it with random samplingto compute more “complex” aggregation functions (e.g.,quantiles) in a distributed way.

3.5.2 Other approachesIn the last years, several gossip-based approaches havebeen proposed, due to the appealing characteristics ofgossip communication: simplicity, scalability and robust-ness. Several alternative algorithms inspired by the push-sum protocol have been proposed, like: Push-Pull Gossip-ing [62], [30] which provides an anti-entropy aggregationtechnique (see section 4.2), or Gossip-based Generic Ag-gregation Protocol (G-GAP) [63] that extended the push-synopses protocol to tolerate non contiguous faults (i.e.,only support faults if neighbors do not fail within thesame short time period).

Another aggregation algorithm supported by an in-formation dissemination and group membership man-agement protocol, called newscast protocol, is proposedin [64]. This approach consists in the dissemination ofa cache of items (with a predefined size) maintained byeach network node. Periodically, each node randomly se-lects a peer, considering the network addresses of nodesavailable on the local cache entries. The cache entriesare exchanged between the two nodes and the receivedinformation is merged into their local cache. The mergeoperation discards the oldest items, keeping a predefinednumber of the freshest ones, also ensuring that there

is at most one item from each node in the cache. Anestimate of the desired aggregate can be produced byeach network node, by applying the correct aggregationfunction to the local cache of items.

3.6 Hybrid approachesHybrid approaches combine the use of different commu-nication techniques to obtain improved results from theirsynergy. Commonly, the use of a hierarchic topologyis mixed with gossip communication. Hierarchic basedschemes are efficient and accurate, but highly affected bythe occurrence of faults. On the other hand, gossip basedalgorithms are more resilient to faults, but less efficient interms of overhead (requiring more message exchanges).In general, this combination enables hybrid approachesto achieve a fair trade-off between performance (in termsof overhead and accuracy) and robustness, when per-forming aggregation in more realistic environments.

3.6.1 (Chitnis et al., 2008)Chitnis et al. [39] studied the problem of computingaggregates in large-scale sensor networks in the presenceof faults, and analyzed the behavior of hierarchy-based(i.e., TAG) and gossip-based (i.e., Push-Sum Protocol)aggregation protocols. In particular, they observe thattree-based aggregation is very efficient for very smallfailures probabilities, but its performance drops rapidlywith increasing failure rates. On the other hand, a gossipprotocol is slightly slowed down (almost unaffected),and is better to use with failures (compared to tree-based). Considering these results, the authors proposedan hybrid protocol with the intent of leveraging thestrengths of both analyzed mechanisms and minimizingtheir weakness, in order to achieve a better performancein faulty large-scale sensor networks.

This hybrid approach divides the network nodes ingroups, and a gossip-based aggregation is performedwithin each group. A leader is elected for each group,and an aggregation tree is constructed among leadernodes (multi-hop routing may be required between lead-ers) to perform a tree-based aggregation with the resultsfrom each gossip group. The authors also defined andsolved an optimization problem to get the best combina-tion between the two aggregation mechanisms, yieldingthe optimal size of the groups according to the networksize and failure probability. However, in practice it re-quires the pre-computation of the gossip group size (bysolving the referred optimization problem) before start-ing to use of the protocol with optimal settings. Resultsfrom simulations showed that the hybrid aggregationapproach usually outperforms the other two (tree-basedand gossip-based) 3.

An extension of the previous approach for heteroge-neous sensor networks is later discussed in [65]. In thiscase, it is considered that a few distinguished nodes,

3. Notice that only static network settings (no nodes arriving orleaving) were considered by the authors.

Page 12: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

12

designated as microservers, which are more reliable andless prone to failure than the remaining ones, are avail-able in the network. The aggregation technique worksmostly like the one previously described for the homoge-neous case, but with two differences that take advantageof the reliability of microservers. First, microservers arepreferably chosen as group leaders. Second, microserversare put on the top of the created aggregation tree thatmay also be composed by other less reliable motes4. Theuse of microservers in the aggregation tree increases itsrobustness, and by putting them at the top reduces theneed the reconstruct the whole tree when a fault occurs.The obtained evaluation results show that the aggrega-tion process can be enhanced in heterogeneous networks,when taking advantage of more reliable (although moreexpensive) nodes.

3.6.2 Other ApproachesA more elaborated hybrid structure was previously de-fined by Astrolabe [66]. Astrolabe is a DNS-like dis-tributed management system that supports attribute ag-gregation. It defines a hierarchy of zones (similar tothe DNS domain hierarchy), each one holding a list ofattributes called Management Information Base (MIB).This structure can be viewed as a tree, each level com-posed by non-overlapping zones, where leaf zones aresingle hosts, each one running an Astrolabe agent, andthe root zone includes the whole network. Each zoneis uniquely identified by a name hierarchy (similarlyto DNS), assigning to each zone a unique string namewithin the parent zone; the global unique name ofeach zone is obtained by concatenating the name ofall its parent zones from the root with a predefinedseparator. The zone hierarchy is implicitly defined bythe name administratively set to each agent. A gossipprotocol is executed between a set of elected agentsto maintain the existing zones. The MIB held by eachzone is computed by a set of aggregation functions thatproduce a summary of the attributes from the childzones. An aggregation function is defined by a SQL-like program that have is code embedded in the MIB,being set as a special attribute. Agents keep a local copyof a subset of all MIBs, in particular of zones in thepath to the root and siblings, providing replication of theaggregated information with weak consistency (eventualconsistency). A gossip protocol is used for agents toexchange data about MIBs from other (sibling) zones andwithin its zone, and update its state with the most recentdata.

Another hierarchical gossiping algorithm was intro-duced by Gupta et al. [67], being one of the first to usegossip for the distributed computation of aggregationfunctions. According to the authors, the philosophy ofthis approach is similar to Astrolabe, but uses a moregeneric technique to construct the hierarchy, called GridBox Hierarchy. The hierarchy is created by assigning

4. Alternative designation of a wireless sensor node [14].

(random or topology aware) unique addresses to allmembers, generated from a known hash function. Themost significant digits of the address are used to dividenodes into different groups (grid boxes) and define thehierarchy. Each level of the hierarchy corresponds to a setof grid boxes, matching a different number of significantdigits. The aggregation process is carried out from thebottom to the top of the hierarchy in consecutive gossipphases (for each level of the hierarchy). In each phase:members of the same grid box gossip their data, computethe resulting aggregate after a predefined number ofrounds, and then move to the next phase. The protocolterminates when nodes find themselves at the grid boxat the top of the hierarchy (last phase). Note that, thisapproach does not rely on any leader election scheme toset group aggregators, in fact the authors argue for theinadequacy of such mechanism in unreliable networksprone to message loss and node crashes.

Recently, an approach that combines a hierarchy-basedtechnique with random sampling was proposed in [68]to approximate aggregation functions in large WSN.This approach regulates the amount of collected data bya sampling probability, aiming at reducing the energyconsumption to compute the aggregate. The samplingprobability is calculated from the input accuracy ex-pressed by two parameters ε and δ, i.e. relative errorless than ε with probability greater than 1 − δ, and theaggregation function, i.e. COUNT, SUM or AVERAGE. Thisalgorithm considers that the sensing nodes are orga-nized in clusters, according to their geographic location,and that cluster heads form a spanning tree rooted atthe sink. Basically, the aggregation proceeds as follow-ing: first, the sink computes the sampling probabilityp (according to ε and δ) and transmits it along withthe aggregation function to all cluster heads across thespanning tree; then, cluster heads broadcast p to theircluster and each node within it independently decidesto respond according to the received probability; sam-ples are collect at each cluster head which computes apartial result; finally, the partial results are aggregatedupward the tree (convergecast) until the sink is reached,and where the final approximate result is computed.This algorithm, referred by the authors as BernoulliSampling on Clusters (BSC), mixes the application of acommon hierarchy based aggregation techniques suchas TAG (see Section 3.1) between cluster heads, with aflooding/broadcast method like Randomized Reports (seeSection 3.3) to sample the values at each cluster.

4 COMPUTATION TAXONOMYIn terms of the computational principles in which theexisting aggregation algorithms are based on, the fol-lowing main categories (see Table 4) were identified:Hierarchical, Averaging, Sketches (hash or min-k based),Digests, Deterministic, and Randomized. These categoriesintrinsically support the computation of different kindof aggregation functions, as depicted in Table 3. For in-stance, Hierarchical approaches allow the computation of

Page 13: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

13

TABLE 3Functions computed by aggregation technique

Decomposable Non Decomposable

DS DI DS DI

Hierarchical x x

Averaging x

Sketches x x

Digests x x x x

Deterministic COUNT

Randomized COUNT

DS: Duplicate Sensitive; DI: Duplicate Insensitive;

any decomposable function. Averaging techniques allowthe computation of all duplicate sensitive decomposablefunctions that can be derived from the AVERAGE, by us-ing specific initial input values and combining the resultsfrom different instances of the algorithms. Sketches tech-niques also allow the computation of duplicate sensitivedecomposable functions that can be derived from theSUM function5.

Moreover, schemes based on hash sketches are na-tively able to compute distinct counts (non decompos-able duplicate insensitive), and those based on min-kcan be easily adapted to compute it (e.g., in extremapropagation, see 4.3, using the input value as seed of apseudo-random generation function, so that duplicatevalues will generate the same number). Digests supportthe computation of any kind of aggregation function, asthis type of approach usually allows the estimation ofthe whole data distribution (i.e., values and frequencies)from which any function can be obtained. On the otherhand, some techniques are restricted to the computationa single type of aggregation function, such as COUNT,which is the case of the Deterministic, and Randomizedapproaches.

Besides determining the supported aggregation func-tion, the computational technique on which an aggre-gation algorithm is based constitutes a key elementwhen defining its behavior and performance, especiallyin terms of accuracy and reliability. Hierarchical ap-proaches are accurate and efficient in terms of messageand computational complexity, but not fault tolerant.Averaging schemes are more reliable and also relativelyaccurate (converge over time), although less efficient,requiring more message exchanges. Approaches basedon the use of sketches are more reliable than hierarchicalschemes, adding some redundancy and providing fastmulti-path data propagation, however they introducean approximation error, depending on the number ofinputs and size of the used sketch. Digests essentiallyconsist in the reduction (compression) of all inputs into a

5. Note that, COUNT is the sum of all elements considering theirinput value as equal to 1.

TABLE 4Taxonomy from the computation perspective

Aggregation Basis/Principles Algorithms

DecomposableFunctions

Hierarchic

TAG [14], DAG [15],I-LEAG [16],Tributary-Delta [20],(Chitnis et al., 2008) [39]

Averaging

Push-Sum Protocol [29],Push-Pull Gossiping [30],DRG [31],Flow Updating [32], [33],(Chitnis et al., 2008) [39]

Sketches

Sketches [17],RIA-LC/DC [18], [19],Extrema Propagation [34],Tributary-Delta [20]

ComplexFunctions

DigestsQ-Digest [21],Equi-Depth[35],Adam2 [36]

Counting

Deterministic (Dolev et al., 2002) [25], [26]

Randomized

Sam

plin

g

Random Tour [24],Randomized Reports [23],Sample & Collide [27], [24],Capture-Recapture [28],Hop-Sampling [37], [38],Interval Density [37], [38],(Kutylowski et al., 2002) [58]

Estim

ator

(Horowitz andMalkhi, 2003) [22]

fixed size data structure, using probabilistic methods andlosing some information. Consequently, digests providean approximation of the computed aggregation function,not the exact result. Randomized schemes are also basedon probabilistic methods to compute the COUNT, mostlyby sampling and thus being inaccurate and lightweightin terms of message complexity, as only a portion ofthe network is asked to participate. On the other hand,deterministic counting methods require gathering datafrom the entire network, which can incur in scalabilityissues.

In the following sections, the main principles andcharacteristics of these distinct classes are explained in acomprehensive way, and some important examples aredescribed. A taxonomy of the identified computationalprinciples is displayed in Table 4, associating them to themost relevant distributed aggregation algorithms.

4.1 HierarchicalHierarchical approaches take direct advantage of thedecomposable property of some aggregation functions.Inputs are divided into separate groups and the compu-tation is distributed and hierarchical. Algorithms from

Page 14: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

14

this class depend on the initial creation of a hierarchiccommunication structure (e.g., tree, cluster hierarchy),where nodes can act as forwarders or aggregators. For-warders simply transmit the received inputs to an upperlevel node. Aggregators apply the target aggregationfunction to all received input (and its own), and forwardthe result to an upper level node. The correct result isyield at the top of the hierarchy, being the aggregationprocess carried out from the bottom to the top.

Algorithms from this class allow the computation ofany decomposable function, providing the exact result(at a single node) if no faults occur. The global processingand memory resources required are equivalent to theones used in a direct and centralized application of theaggregation function, but distributed across the network.However, these algorithms are not fault tolerant, e.g. asingle point of failure may lead to the loss of all databeneath it.

Most of the algorithms from this category correspondto the ones belonging to the hierarchical communicationclass, like TAG [14], DAG [15], and I-LEAG [16]. Otheralgorithms can be found combining a hierarchical com-putation with another computation principle, namely:Tributary-Delta [20] mix a common hierarchical compu-tation, with the use sketches in regions close to the sink;(Chitnis et al., 2008) [39] performs hierarchic aggregationon the top of groups, and averaging is applied insideeach one. See sections 3.1 and 3.6 for more details aboutthe aforementioned algorithms.

4.2 Averaging

The Averaging class essentially consists in the iterativecomputation of partial aggregates (averages), continu-ously averaging and exchanging data among all activenodes that will contribute to the calculation of the fi-nal result. This kind of approach tends to be able toreach a high accuracy, with all nodes converging to thecorrect result along the execution of the algorithm. Atypical application of this method can be found in mostgossip-based approaches (section 3.5), where all nodescontinuously distribute a share of their value (averagedfrom received values) with some random neighbor, con-verging over time to the global network average, thecorrect aggregation result. Algorithms from this categoryare more reliable than hierarchic approaches, workingindependently from the supporting network topologyand producing the result at all nodes. However, theymust respect an important principle, commonly desig-nated as “mass conservation” in order to converge tothe correct result. This invariant states that the sum ofthe aggregated values of all network nodes must remainconstant over time [29].

Algorithms based on this technique are able to com-pute decomposable and duplicate-sensitive functions,which can be derived from the average operation; usingdifferent input initializations (e.g., COUNT), or combiningfunctions executed concurrently (e.g., SUM, obtained by

multiplying the results from an average and a count). Interms of computational complexity, this method usuallyinvolves the computation of simple arithmetic opera-tions (i.e., addition and division), using few compu-tational resources (processor and memory) and thusdepicting a fast execution at each node. This kind ofalgorithm is able to produce (almost) exact results, de-pending on their execution time. The minimum execu-tion time required by these algorithms, in number ofiterations, to achieve a high accuracy, is influenced by thenetwork characteristics (i.e., size, degree of the networkgraph, and topology) and the communication patternused to spread the partial averages. The robustness ofthese types of aggregation algorithms is strongly relatedwith their ability to conserve the global “mass” of thesystem. The loss of a partial aggregate (“mass”) due toa node failure or a message loss introduces an error,resulting in the subtraction of the lost value to the initialglobal “mass” (leading to the non-contribution of thelost amount to the calculation of the final result, andtherefore to the convergence to an incorrect value). Inthis kind of methodology, it is important to enforce theuphold of the “mass” conservation principle, assumingitself as a main invariant to ensure the algorithms cor-rectness.

This class of algorithms is also known as DistributedAverage Consensus in the area of Control Theory [69], [70],[71], [72], where the studied algorithms mainly operatelike the Push-Sum Protocol (see Section 3.5) and Push-Pull Gossiping. This field provides interesting theoreticalresults on the performance of averaging techniques (e.g.,optimized weights and convergence rates), but com-monly consider impractical assumptions (e.g., assuminginstantaneous update of the topology information heldat each node).

4.2.1 Push-Pull GossipingThe push-pull gossiping [62] algorithm performs an av-eraging process, and it is gossip-based like the push-sumprotocol [29] (previously described in section 3.5). Themain difference in this scheme relies on the execution ofan anti-entropy aggregation process. The concept of anti-entropy in epidemic algorithms is related to the regularrandom selection of another node and the performanceof a mutual synchronization of information, exchangingcomplete databases [73]. In particular, this algorithmexecutes an epidemic protocol to perform a pairwiseexchange of aggregated values between neighbor nodes.Periodically, each node randomly chooses a neighbor tosend its current value, and waits for the response withthe value of the neighbor. Then, it averages the sentand received value, and calculates the new estimatedvalue. Each time a node receives a value from a neighbor,it sends back its current one and computes the newestimate (average), using the received and sent valuesas parameters.

Figure 3 illustrates the push-pull process performedby the algorithm. In this particular example, two nodes

Page 15: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

15

∑= 4t = 0

1 3i j

∑= 4t = 1

1 2

PUSH(1)i j

∑= 4t = 2

2 2

PULL(3)i j

Fig. 3. Execution example of Push-Pull Gossiping.

are considered i and j, respectively with an initial valuevi = 1 and vj = 1. Then, it is assumed that node irandomly chooses j to execute the push-pull process,and it sends him a push message with its current value1. Upon reception of the push message, j responds toi with a pull message with its current value 3 and itimmediately updates its estimate with the average of thereceived and sent value, i.e. vj = (1 + 3)/2 = 2. Uponreception of the pull message, node i will also update itsestimate by averaging the received and current values,i.e. vi = (3 + 1)/2 = 2.

In order to be adaptive and handle network changes(nodes joining/leaving), the authors consider the exten-sion of the algorithm with a restart mechanism (execut-ing the protocol during a predefined number of cycles,depending on the desired accuracy, and then restartinga new cycle with fresh initial values). However, they donot address the “mass” conservation problem – impactof message losses or node failures.

A further study of this aggregation algorithm is dis-cussed in [30], proposing a more mature solution thatcovers some practical issues: splitting the algorithmexecution in two distinct threads; use of timeouts todetect possible faults, ignoring data exchanges in thosesituations; suggesting different versions of the algorithmaccording to the aggregation function to compute; sug-gesting the execution of several parallel instances of thealgorithm to increase its robustness.

4.2.2 DRG (Distributed Random Grouping)This approach [31] essentially consists in the continuousrandom creation of groups across the network, in whichaggregates are successively computed (averaged). DRGwas designed to take advantage of the broadcast natureof wireless transmission, where all nodes within radiorange will be prone to hear a transmission. Thus, thisapproach is specific for WSN. The algorithm definesthree different working modes for each node: leader,member, and idle mode. According to the defined modesand the performed state transitions, the execution ofthe algorithm can be separated in three main steps.First, each node in idle mode independently decidesto become a group leader (according to a predefinedprobability), and consequently broadcast a Group CallMessage (GCM) to all its neighbors, subsequently wait-ing for members to reply. Second, all nodes in idle mode

∑= 12t = 0

3

1

2 2

4

∑= 12t = 1

3

1

2 2

4

GCM

GCM

GCMGCM

∑= 12t = 2

3

1

2 2

4

∑= 12t = 3

2

2

2 2

4

JACK(3)

JACK(1)

JACK(2) GAM(2)

GAM(2)

GAM(2)

GAM(2)

i

j

i

i i

j

j j

k k

k k

l l

l l

m m

m m

Fig. 4. Execution example of Distributed Random Group-ing.

which received a GCM from a leader respond to thefirst one with a Joining Acknowledgment (JACK) taggedwith their aggregate value, becoming members of thatgroup (updating their state mode accordingly). In thethird step, after gathering the group members’ valuesfrom received JACKs, the leader computes (averages)the group aggregate and broadcasts a Group AssignmentMessage (GAM) with the result, returning to idle mode.Each group member waits until it receives the resultinggroup aggregate from the leader to update its local value(with the one assigned in the GAM) and returns to idlemode, not responding to any other request until then.

Figure 4 illustrates the execution of the algorithm tocompute the AVERAGE. In this particular example, nodei decides to become leader and sends a GCM to all itsneighbors. In the considered scenario, only nodes j, kand l receive the message and reply to i with a JACKmessage with their respective current value vj = 1, vk =2 and vl = 3. After receiving JACK messages from itsneighbors, the leader i computes the group average fromthe received values and its current value vi = 2, i.e. a =(1 + 2 + 3 + 2)/4 = 2. Then, the leader updates its valuewith the resulting average and sends a GAM with theresult to its neighbors so that the group members canalso update their value with the new average.

The repeated execution of this scheme – creation ofdistributed random groups to perform in-group aggre-gation – allows the eventual convergence of the estimateproduced at all nodes to the correct aggregation result, aslong as the groups overlap over time. The performanceof DRG is influenced by the predefined probability of anode becoming a leader, which determines its capacityto create groups (quantity and size of groups). Note

Page 16: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

16

that, in order to account for the occurrence of faultsand avoid consequent deadlock situations that couldarise in this algorithm, it is necessary to consider thedefinition of some timeouts (for the leaders to wait forJACKs, and the members to wait for a GAM). Intuitively,one will notice that the values set for those timeoutswill highly influence the performance of the algorithm,although this detail is not addressed by the authors.An analysis of DRG on WSN with randomly chang-ing graphs (modeling network dynamism) is providedin [74], assuming that the graph only changes at thebeginning of each iteration of the algorithm. Unrealisticassumption in practice, and a potential source of massloss.

4.2.3 Flow UpdatingFlow Updating [32] is a recent aggregation techniquethat is inspired in the concept of network flow (fromgraph theory). Unlike common averaging approaches,that start with the initial input value and iterativelychange it by exchanging “mass” along the executionof the algorithm, this approach keeps the initial inputunchanged, exchanging and updating flows associatedto neighbor nodes. The key idea is to explore the conceptof flow, and instead of storing the current average ateach node in a variable compute it from the input valueand the contribution of the flows along the edges to theneighbors. In a sense, flows represent the value that mustbe transferred between two adjacent nodes for them toproduce the same estimate, and are skew symmetric (i.e.,the flow value from i to j is the opposite from j to i:fij = −fji). For example: considering a network withtwo directly connected nodes i and j with initial inputvalues vi = 1 and vj = 3, for them to produce the sameaverage a = 1+3

2 = 2, the flows at node i and j mustbe respectively set to fij = vi − a = 1 − 2 = −1 andfji = vj − a = 3− 2 = 1 = −fij .

In more detail, each node i stores a flow value fijto each neighbor j, besides its input value vi whichis unchanged by the algorithm. Periodically, each nodecomputes a new estimate e′i by averaging the onesreceived from neighbors ej with its own ei (obtained bysubtracting all flows to its input value). The flows fij toeach neighbor j are then locally updated in order to pro-duce the new estimate result, adding the difference be-tween the new estimate and the one previously receivedto the respective flow value. Afterwards, the node sendsin a message the flows fij to each neighbor j, as well asthe new estimate. Upon reception of a message, node jupdates its flow fji with the symmetric value addressedto him −fij , and keeps the received estimate to furthercompute the next average. The iterative execution of thisprocess across the whole network allows the estimate ofall nodes to converge to the global average of the inputvalues.

This approach distinguishes itself from the existingaveraging algorithms by its fault-tolerant capabilities. Itsolves the mass conservation problem observed on other

averaging approaches when subject to message loss, thataffect their correctness leading them to converge to awrong value. Other approaches require additional mech-anism to detect and restore the lost mass, which is oftennot feasible in practice. In contrast, Flow Updating is bydesign able to support message loss, only delaying con-vergence without affecting the convergence to the correctvalue, without requiring any additional mechanism. Thisis achieved by keeping the input values unchanged andperforming idempotent flow updates which guaranteetheir skew symmetric properties. Moreover, it has beenshown that this approach is resilient to node crashand able to support churn (without requiring protocolrestarts), self-adapting to network changes [33]. Recently,a variation of this technique called Mass Distributionwith Flow Updating (MDFU) was proposed [75], thatprovides a convergence proof and characterization of theconvergence time under stochastic message loss.

4.2.4 Other ApproachesA well-known averaging approach, the Push-Sum (push-synopses) Protocol [29] has already been described inSection 3.5. In the last years, other approaches inspiredin the Push-Sum Protocol have been proposed, intendingto be more efficient in terms of performance and robust-ness. Kashyap et al.[76] reduced the number of messagesneeded (communication overhead) to compute an aggre-gation function at the cost of an increase in the numberof rounds. Gossip-based Generic Aggregation Protocol(G-GAP) [63] extended the push-synopses protocol [29] tosupport discontinuous failures (no adjacent node can failwithin a period of 2 rounds) by restoring the mass lossresulting from failures (temporarily storing at each nodeprevious data contributions). Recently, LimoSense [77]introduced effective strategies to avoid mass loss. How-ever, the mechanism that balances mass growth can leadto divisions by zero under some message sequences.

Dimakis et al. [78], [79] proposed an algorithm toimprove the convergence time in random geometricnetworks. This scheme is similar to push-pull gossip-ing [62], differing in the peer selection methods. Insteadof selecting a one-hop node as target of the averagingstep, peers are selected according to their geographicallocation. In particular, a location is randomly chosenand the node closer to that local is selected. A greedygeographic routing process is used to reach the nodeat the target location, assuming that nodes known theirown geographic location.

Two averaging algorithms for asynchronous and dy-namic networks were proposed in [80]. The core of theproposed schemes is based on a pairwise update, sim-ilarly to the push-pull gossiping (although not referredby the authors), addressing practical concerns that arisein asynchronous settings. In the first proposed algorithmnodes implement a blocking scheme to avoid the inter-ference of other nodes in the update step and guaranteemass conservation. Additionally, a deadlock avoidancemechanism is considered, by imposing a sender-receiver

Page 17: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

17

relation on each link based on nodes Unique Identifiers(UID). An extension to the first algorithm is proposed tocope with churn. The blocking mechanism (maintainingthe directed relationship between nodes) is removed,and an additional variable is used to account for changesin each neighbor. When a node leaves the network, allits neighbors subtract the value associated with it fromtheir state.

4.3 Sketches

The main principle in this kind of aggregation algorithmis the use of an auxiliary data structure with a fixedsize, holding a sketch of all network values. The inputvalues of each node are used to create sketches that areaggregated across the network, using specific operationsto update and merge them. Sketches are order andduplicate insensitive, enabling them to be aggregatedthrough multiple paths, being independent from therouting topology. This kind of technique is based on theapplication of a probabilistic method, generally allowingthe estimation of the sum of the values held in the sketch.

Sketching techniques can be based on different meth-ods, with different accuracy bounds and computationalcomplexities. Algorithms from this class are mostlybased on the application (with some improvements) oftwo main ideas: hash sketches [41], [81], [82], [83] and k-mins sketches [84].

Hash sketches allow the probabilistic counting of thenumber of distinct elements in a multiset (cardinality ofthe support set). This type of sketch essentially consistsin a map of bits, initially set to zero, where each item ismapped into a position in the binary valued map (gen-erally involving a uniform hashing function) setting thatbit to one. The distinct count is estimated by checkingthe position of the most significant bit that is set to one(leftmost), or counting the number of bits that are set tozero in the sketch. The first hash sketching technique wasproposed by Flajolet and Martin [41], and is commonlydesignated as FM sketches (uniformly hashes items intoan integer, and maps only the less significant bit 1 of itsbitmap representation to the sketch). In this first study,the authors also proposed the Probabilistic Countingwith Stochastic Averaging (PCSA) algorithm to reducethe variance of the produced estimate, using multiplesketches and averaging their estimate (distributing thehash of an element to only one of the sketches). Anotherapproach, Linear Counting [81] uses a hash function todirectly map each element into a position of the sketch(setting that bit to 1), and use the count of the number ofzeros to produce an estimate. A further improvement toPCSA, designated LogLog, was described in [82], reduc-ing required memory resources (an optimized versionsuper-LogLog is also proposed, improving accuracy andoptimizing memory usage applying a truncation andrestriction rule). HyperLogLog [83] recently improvedLogLog, consuming less memory to achieve a matchingaccuracy.

The k-mins sketches method was first introduced todetermine the size of the transitive closure in directedgraphs [84]. It consists of assigning k independent ran-dom ranks to each item according to a distribution thatdepends on its weight, and keeping in a vector of theminimum ranks in the set. The obtained k-vector of theminimum ranks is used by an estimator to produce anapproximate result. In other words, it can be said thatk-mins sketches reduce the estimation of the sum to thedetermination of minimums of a collection of randomnumbers (generated using the sum operands as inputparameter of the random distribution from which theyare drawn). An improved alternative to k-mins sketches,designated bottom-k sketches, was recently proposed in[85].

The computational cost of sketching is dependenton the complexity of the operations involved in thecreation and update of the sketches (e.g., hash func-tions, random number generation, minimum/maximumdetermination), and the resources used by the estimatorto produce a result. Algorithms based on sketches arenot accurate, being based on probabilistic methods andintroducing an error factor in the computed aggregationfunction. There is a trade-off between the accuracy andthe size of the sketches. The greater the sketch size thetighter are the accuracy bounds of the produced esti-mate, although requiring additional memory resourcesand a larger processing time. These kind of aggregationalgorithms tend to be fast, although conditioned by thedissemination protocol used to propagate the sketches,being able to produce an approximate result after anumber of iterations close to the minimum theoreticalbound (the network diameter).

4.3.1 RIA-LC/DCFan and Chen [18] proposed a multi-path routing aggre-gation approach for WSN based on the use of LinearCounting (LC) sketches [81], which they later namedRobust In-network Aggregation using LC-sketches (RIA-LC) [19]. The algorithm proceeds in two phases, similarto those in multipath hierarchy-based approaches (seeSection 3.1). In the first phase, the aggregation request(query) is spread from the sink throughout the wholenetwork, creating a multipath routing hierarchy. In thesecond phase, starting at the lower level of the hierarchy,nodes respond to the aggregation request by creatinga LC-sketch corresponding to its current local readingsand sending it to the nodes at the upper level. Allreceived sketches are combined with the local one (usingthe OR operation), and the result is sent to the next leveluntil the top of the hierarchy is reached, where the sinkcomputes the aggregate estimate from the resulting LC-sketch.

Equation 1 is used to estimate the number of distinctitems represented in a LC-sketch, where m is the size ofthe allocated bit vector, and z is the count of the numberof bits with value equal to zero. In order to allow thecomputation of the SUM, each node creates a sketch by

Page 18: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

18

mapping a number of distinct items corresponding toits input value. For example, assuming that each nodehas a unique ID, if the node i has an input equal to3, it maps the items (IDi, 1), (IDi, 2), and (IDi, 3) intothe LC-sketch. In more detail, in this case the use ofa hash function from the original LC-sketch design (tomap duplicated items to the same bit) is replaced by auniform random generator (since there are no duplicateitems), randomly setting to 1 a number of bits equal tothe input value.

n = −m ln (z/m) (1)

The authors show by theoretical comparison and ex-perimental evaluation that their approach outperforms,in terms of space and time requirements, approachesbased on FM sketches [41], namely Sketches [17] (see3.1). They also claim a higher accuracy and lowervariance when compared with existing sketch schemes.Moreover, they tackle some practical issues, like messagesize constraints, avoiding the use of hash functions, andenabling the specification of an approximation error.

Afterwards, the authors improved RIA-LC by consid-ering the use of sketches with variable sizes instead offixed size sketches, referring to the new technique asRobust In-network Aggregation using Dynamic Count-ing sketches (RIA-DC) [19]. The authors observed thatthe large preallocated sketches used in RIA-LC werewasting space, since at the beginning of the computationmust of the bits are set to zero. In RIA-DC the initialsize of sketches is variable and depends on the localsensor reading. Along the aggregation process the sizeof the sketches is adjusted (gradually increasing towardthe sink), in order to satisfy a given accuracy constrain.RIA-DC decreases message overhead and energy con-sumption compared to RIA-LC, while keeping similaraccuracy properties.

Recently, another variant of the described schemeswas proposed by the same authors, referred as ScalableCounting (SC) [86], that reduces the space requirements(i.e., size of the transmitted sketches) to obtain the sameaccuracy guarantees.

4.3.2 Extrema PropagationThis approach reduces the computation of an aggrega-tion function, more precisely the sum of positive realnumbers, to the determination of the minimum (ormaximum) of a collection of random numbers [34], [87].Initially, a vector xi of k random number is createdat each network node i. Random numbers are gener-ated according to a known random distribution (e.g.,Exponential or Gaussian), using the node initial valuevi as the input parameter for the random generationfunction (e.g., as the rate of an exponential distribution).Then, the execution of the aggregation algorithm simplyconsists in the computation of the pointwise minimum(or alternatively maximum) between all exchanged vec-tors. This technique supports the use of any information

spreading algorithm as a subroutine to propagate thevectors, since the calculation of minimums is order andduplicate insensitive. In particular, the authors considerthat at each round all nodes send their resulting vectorto all their neighbors.

At each node, the obtained vector is used as a sampleto produce an approximation of the aggregation result,applying a maximum likelihood estimator derived fromthe extreme value theory (branch of statistics dealingwith the extreme deviation from the median of a prob-abilistic distribution). For example, considering the gen-eration at each node of k random numbers with anexponential distribution of rate vi and the use of theminimum function to aggregate the vectors. Equation 2gives the estimator for the SUM of all vi from the sampleof minimums xi[1], ...xi[k] in the vector xi, with varianceSUM2/(k − 2):

SUM =k − 1∑kj=1 xi[j]

(2)

This algorithm is focused on obtaining a fast estimate,rater than an accurate one. Although, the accuracy ofthis aggregation algorithm can be improved by usingvectors of larger size, adjusting k to the desired relativeaccuracy (e.g., k = 387 for a maximum relative errorof 10%). A further extension to the protocol to allowthe determination of the network diameter has beenproposed in [88]. In addition to the use of minimal ormaximal values, other estimators can also be designedfrom the average or range of suitable distributions [89].

4.3.3 Other ApproachesA representative approach based on FM sketches hasalready been described in section 3.1 – Sketches [17]. Inthis multi-path approach, a generalization of PCSA isused to distinguish the same aggregates received frommultiple paths, and subsequently manage to computeduplicate-sensitive aggregation functions. Other similarapproaches, based on hash sketches, can be found in theliterature: like Synopsis Diffusion [43] and Wildfire [90].These approaches apply essentially the same aggregationprocess, operating in two phases (request/response) andonly differing on small details.

Synopsis Diffusion [43] is an aggregation approach forWSN close to the one proposed by Sketches [17]. In asense, this work presents a more generic framework re-lying on the use of duplicate insensitive summaries (i.e.,hash sketches), which they called Order- and Duplicate-Insensitive (ODI) synopses. They generically define thesynopses functions (i.e., generation, fusion and evalu-ation) required to compute aggregation functions, andprovide examples of ODI synopses to compute more“complex” aggregates (i.e., not decomposable aggrega-tion functions). For instance, besides the scheme basedon FM sketches, they propose other data structures (andrespective functions) to uniformly sample sensor read-ings and compute other samplings based aggregation

Page 19: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

19

functions. The authors also tackled additional practicalconcerns. Namely, they explored the possibility to im-plicitly acknowledge ODI synopses to infer messageslosses, and suggested simple heuristics to modify theestablished routing topology (assigning nodes to anotherhierarchic level), in order to reduce loss rate.

Wildfire [90] is based on the use of FM sketches toestimate SUM, but it is targeted at dynamic networks.Despite the fact of operating in two phases like previ-ous hash sketch approaches, and unlike them, it doesnot establish any specific routing structure to aggregatesketches. After receiving the query, nodes start combin-ing the received sketches with their current one, and thensend the result if it differs from the previous one.

A distributed implementation of some basic hashsketches schemes has been proposed in [91], [92]. Dis-tributed Hash Sketches (DHS) is supported by a DHT,taking advantage of the load balancing properties, andscalability of such structure. More specifically, the au-thors describe how to build DHS based on PCSA [41]and supper-LogLog [82].

Mosk-Aoyama and Shah [93], [94] proposed an algo-rithm, called COMP, to compute the sum of values fromindividual function (referred as separable functions).This algorithm is very similar to Extrema Propagationbut less generic, as it is restricted to the properties ofexponential random variables distribution. Furthermore,COMP uses a biased estimator, being less accurate forsmall sketch sizes than Extrema Propagation that usesunbiased ones.

4.4 DigestsThis category includes algorithms that allow the com-putation of more complex aggregation functions, likequantiles (e.g., median) and frequency distributions (e.g.,mode), in addition to common aggregation functions(e.g., count, average and sum). Basically, algorithms fromthis class produce a digest that summarizes the systemdata distribution (e.g., histogram). The resulting digestis then used to approximate the desired aggregationfunctions. We refer to a digest as a data structure with abounded size, that holds an approximation of the statis-tical distribution of input values in the whole network.This data structure commonly corresponds to a set ofvalues or ranges with an associated counter.

Digests provide a fair approximation of the data dis-tribution, not holding an exact representation of all thesystem values for efficiency and scalability reasons. Theaccuracy of the result yield from a digest depends on itsquality (i.e., used data representation) and size. Digestsallow the computation of a wider range of aggregationfunctions, but usually require more resources and areless accurate than the other more specialized approaches.

4.4.1 Q-DigestAn aggregation scheme that allows the approximationof complex aggregation function in WSN is proposed

in [21]. This approach is based on the construction anddissemination of q-digests (quantile digests) along a hier-archical routing topology (without routing loops and du-plicated messages). A q-digest consists of a set of buck-ets, hierarchically organized, and their correspondingcount (frequency of the values contained by the bucket).Buckets are defined by a range of values [a, b] and canhave different sizes, depending on the distribution ofvalues they represent. Each node maintains a q-digestof the data available to it (from its children). Q-digestsare built in a bottom-up fashion, by merging receiveddigests from child nodes, and further compressing theresulting q-digest according to a specific compressionfactor (less frequent values are grouped in large buckets).Aggregation functions are computed by manipulating(e.g., sorting q-digest nodes) and traversing the q-digeststructure according to specific criteria that depends onthe function to be computed.

The authors provide an experimental evaluation,where they showed that q-digest allows the approxi-mation of quantile queries using fixed message sizes,saving bandwidth and power when compared to a naivescheme that collects all the data. The naive schemeobtains an exact result, but with increasing messagesize along the routing hierarchy. Obviously, there is atrade-off between the obtained accuracy and the messagesize used. The authors suggested a way to computethe confidence factor associated with a q-digest (i.e., theerror associated to a query), but the effect of faults wasnot considered in their study.

4.4.2 Equi-Depth

A gossip-based approach to estimate the network dis-tribution of values is described in [35]. This scheme isbased on the execution of a gossip protocol and theapplication of specific merge functions to the exchangeddata, to restrict storage and communication costs. Inmore detail, each node keeps a list of k values (digest),initially set with its input value. At each round, nodes getthe list of values from a randomly chosen neighbor andmerge it with its own, applying a specific procedure. Theresults from the execution of several rounds produce anapproximation of the network distribution of values (i.e.,histogram). Four merging techniques were consideredand analyzed by the authors: swap, concise counting, equi-width histograms, and equi-depth histograms.

Swap simply consists in randomly picking k valuesfrom the two lists (half from each one of them) and dis-carding the rest. Although simple, by discarding half ofthe available data in each merge important informationis likely to be lost.

Concise counting associates a tuple, value and count,to each list entry. The merge process consists in sortingthe tuples (by value), and individually merging thetuples with the closest values, in order to keep a fixedlist size. Tuples are merged by randomly choosing oneof the values and adding their counts.

Page 20: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

20

The equi-width technique breaks the range of possiblevalues into bins of equal size, associating a counter toeach one. Initially, nodes consider the range from 0 to thecurrent input value, as the extremes are not known. Binsare dynamically resized when new extremes are found:all bins are mapped into larger ones, based on theirmiddle value and the range of the new bin, adding theircounter to the new mapped bin. This technique requiresonly the storage of the extreme values and counts, sinceall bins have an equal width, reducing the volumeof data that needs to be stored and exchanged whencompared to other techniques (e.g., concise counting).However, equi-width can provide very inaccurate resultsfor severely skewed distributions.

In equi-depth, bins are divided not to be of the samewidth but to contain approximately the same count.Initially, fixed size bins are set, each represented by a pair〈value, counter〉, dividing the range from 0 to the inputvalue. Whenever data is exchanged, all pairs (receivedand owned) are ordered, and consecutive bins that yieldthe smallest combined bins (in terms of count) aremerged, repeating the process until the desired numberof bins is obtained. Bin merging consists in adding thecounters and using the arithmetic weighted mean asvalue. This method intends to minimize the countingdisparity across bins.

In order to deal with changes of the nodes input valuesover time, the authors consider the execution of theprotocol in phases, restarting it. The authors experimen-tally evaluated their protocol comparing the previousmerging techniques. The obtained results showed thatequi-depth outperformed the other approaches, provid-ing a consistent trade-off between accuracy and storagerequirements for all tested distributions. The authors alsoevaluated the effect of duplicates, from the executionof the gossip protocol. They argue from the obtainedresults that although duplicates bias the estimated re-sult, it is more advantageous (simpler and efficient) toassume their presence than trying to remove them. Theoccurrence of faults and change of the input values werenot evaluated.

4.4.3 Adam2Adam2 is a gossip-based algorithm to estimate the sta-tistical distribution of values across a decentralized sys-tem [36]. More precisely, this scheme approximates theCumulative Distribution Functions (CDF) of an attribute,which can then be used to derive other aggregates. Inthis case, a “digest” is composed by a set Hi of k pairsof values (xk, fk), where xk represents an interpolationpoint and fk is the fraction of nodes with value less orequal than xk. At a high abstraction level, it can be saidthat the algorithm simply executes several instances ofan averaging protocol (i.e., Push-Pull Gossiping [30]) toestimate the fraction of nodes in each pair of the CDF.

In more detail, each node can decide to start aninstance of Adam2 according to a predefined probability1

niR, where ni is the current network size estimate at

node i and R is an input parameter that regulates theaggregation instances frequency (i.e., on average oneevery R rounds). Each instance is uniquely identified byits starting node. Initially, the starting node i initializesthe interpolation set Hi in the following way: fractionsfk are set to 1 if the node attribute reading vi is lessor equal than the corresponding interpolation value xk,otherwise it is set to 0. Nodes store a set of interpolationpoints Hi for each running algorithm instance (initiatedby a node i). Upon learning about a new instance, anode j initializes Hi setting fk = 1 if aj ≤ xk andfk = 0 otherwise, and starts participating in the protocol.A push-pull like aggregation is then performed, wherenodes randomly choose a neighbor to exchange theirset Hi, which are subsequently merged by averagingthe fractions at each interpolation point. Over time, thefractions will eventually converge at each node to thecorrect result associated to each pair. After a predefinednumber of rounds (time-to-live), the CDF is approximatedby interpolating the points of the resulting set Hi.

Note that, Adam2 concurrently estimates (by averag-ing) other aggregation functions besides CDF, namelyCOUNT to determine the network size, and MIN/MAX tofind the extreme attribute values. The result from theseaggregation functions are later used as input values ofthe next instances of the algorithm to tune and optimizeits execution (i.e., calculate the instance starting proba-bility, and set new interpolation points).

Like in Push-Pull Gossiping [62], [30], Adam2 han-dles dynamism (i.e., attribute changes and churn) bycontinuously starting new instances of the algorithm –restart mechanism. The authors evaluated the algorithmby simulation, comparing it with previous techniquesto compute complex aggregates (e.g., Equi-Depth). Theobtained results showed than Adam2 outperforms thecompared approaches, exhibiting better accuracy.

4.4.4 Other ApproachesOne of the first algorithms to compute complex aggre-gation functions in WSN was introduced by Greenwaldand Khanna [95]. Their approach is similar to the onepreviously described for Q-Digest: nodes compute quan-tile summaries (digest) that are merged in a bottom-upfashion along a tree topology, until the root is reached.

Another gossip-based scheme to estimate the distri-bution of input readings, and able to detect outliers,was introduced in [96], [97]. In a nutshell, this approachoperates like the push-sum protocol [29] (described inSection 3.5), but manipulates a set of clusters (digests)instead of a single value, applying a specific clusteringprocedure.

Recently, a novel algorithm named Spectra [98] wasintroduced to estimate the distribution function of avalue (more precisely its CDF). Basically, this approachworks similarly to Adam2 but replaces the Push-PullGossiping [30], the technique at is core, by the FlowUpdating [33] scheme. Spectra provides an improvedperformance and robustness when compared to Adam2,

Page 21: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

21

inheriting the characteristic of the flow updating mech-anism, supporting high-levels of message loss and self-adapting to network change (without requiring restarts).

In general, existing aggregation approaches can be ex-tended to compute more complex aggregation functions,for instance combining them with an additional sam-pling technique. However, this additional functionalityis not part of the essence of their core algorithm, bearingdifferent characteristics (e.g., accuracy) and concerns.Some examples can be found in [29] where push-sum isextended with a push-random protocol to obtain randomsamples, and in [99] which introduces algorithms to es-timate several spatially-decaying aggregation functions.

4.5 Counting (Deterministic/Randomized)This category refers to a restricted set of distributedalgorithms, designed to compute a specific aggregationfunction: COUNT6. Recall that COUNT allows the deter-mination of important properties in the design of somedistributed applications. For instance, in this context itfinds a common practical application in the determina-tion of the size of the system (or group), or to count thenumber of votes in an election process.

According to the computational principles, fromwhich all the approaches of this class are founded, twosub-classes of counting algorithms have been identified:deterministic and randomized. Deterministic algorithms cor-respond to those that precisely count the number oftarget elements. The randomized sub-class refers to algo-rithms that rely on the use of some randomized principleand probabilistic method to produce an estimate. Thistype of algorithm is usually based on the executionof some sampling technique to provide a probabilisticapproximation to the size of the sample population.Nonetheless, a few algorithms are found that do notcollect samples for size estimation, applying insteada probabilistic estimator over some observed events orknown system properties.

Algorithms based on sampling are strongly influencedby the probabilistic method used to obtain the result,inheriting its properties. The accuracy of the algorithmcorresponds to the one provided by the used prob-abilistic method, being bounded by the error factorassociated with it. Several probabilistic methods havebeen applied to samples to yield a counting estimation,namely: birthday problem [100] – paradox that refers to theprobability of two elements sampled out of a populationnot being repeated, inspired from the probability of twopeople out of a group not having a matching birthday(exemplifying results: from a random group of 23 peoplethe probability of two of them been born on the sameday of the year is about 50%, and is greater than 99%for a group of 57 persons); capture-recapture [101] –probabilistic method based on the repeated capture of

6. Although other classes of algorithms (e.g., averaging) can also beused for COUNT, this category refers to approaches confined to thecomputation of this aggregation function.

samples from a closed population (population that main-tains a fixed size during the sampling process), wherethe number of repeated elements between samples areaccounted to provide an estimate of the population size;fundamental probabilistic methods – application of Bernoullibased sampling methods [68], and other basic probabilis-tic concepts on some sampled statistical information, likethe distances between nodes (number of hops) or thenumber of messages successfully sent/received, in orderto estimate de size of the network.

In all cases, typically sampling is performed at asingle node, and it can take several rounds to collecta single sample. Moreover, an estimation error is alwayspresent, even if no faults occur. For example, in Sample& Collide [27], [24] the estimation error can reach 20%,and a sampling step takes dT (where d is the averageconnection degree and T is a predefined timer that mustbe sufficiently large to provide a good sample quality),needing to be repeated until l sample collisions areobserved.

As previously referred, in some cases a size esti-mate can be obtained by directly applying an esti-mator on some available system knowledge (observedevents or other known properties), without any previoussampling. For instance, in the approach proposed byHorowitz and Malkhi [22] (see section 3.2) an estimatorfunction is used at each node to estimate the networksize, based on the observation of two events (nodes join-ing or leaving the network), incrementing/decrementingthe estimator. Other approaches like the one proposed in[102], [103] provide a size estimate based on knowledgeof the routing structure, in this particular case countingthe number of high degree nodes. Note that, this kindof techniques does not provide accurate result, in mostcases yielding a rough approximation to the correctvalue.

4.5.1 DeterministicDolev et al. [25], [26] proposed a deterministic countingapproach to estimate the size of a group. This scheme isbased on the circulation of a token across the network,where the information from visited nodes is stored ateach passage (updating their counters). The token per-forms a random walk (see section 3.4) in order to countthe number of nodes. Notice that, a random walk canbe seen as a probabilistic sampling process, however inthis case the idea is to continuously reach all availablenodes (not sample a portion of them). Moreover, here,the size is estimated by a simple deterministic count ofthe number of valid elements stored in the token (not bythe application of a probabilistic estimator).

In the case of large network, the data that mightbe stored in the token can correspond to a propor-tionally large amount of information, which needs tobe transmitted from node to node, and consequentlymay originate a performance bottleneck. A single visitto all network nodes is required to produce an exactestimation, however it is hard to determine if all nodes

Page 22: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

22

have been visited (since the network size is unknown).Moreover, the loss of the token due to a single failurewill compromise the whole process, which depends onits existence.

4.5.2 Sample & CollideThis approach [27], [24] addresses the problem of count-ing the number of peers in a P2P overlay network,inspired by a birthday problem technique (first proposedby Bawa et al. on a technical report [23]). The applicationof this probabilistic method requires the collection ofuniform random samples. To this end, the authors pro-posed a peer sampling algorithm based on the executionof a continuous time random walk, in order to obtainunbiased samples (asymptotically uniform). The sam-pling routine proceeds in the following way: an initiatornode i sets a timer with a predefined value T , whichis sent in a sampling message to a randomly selectedneighbor; upon receiving a sampling message, the targetnode (or the initiator after setting the timer) picks arandom number U uniformly distributed within theinterval [0, 1], and decrements the timer by log (1/U)/di(i.e., T ← T − log (1/U)/di, where di is the degree ofnode i); if the resulting value is less or equal than zero(T ≤ 0) then the node is sampled, its identification isreturned to the initiator and the process stops; otherwisethe sampling message is sent to one of its neighbors,chosen uniformly at random. The quality of the obtainedsamples (approximation to a uniform random sampling)depends on the value T initially set to the timer.

The described sampling step (to sample one peer)must be repeated until one of the nodes is repeatedlysampled a predefined number of times l (i.e., l samplecollisions are observed). After concluding this samplingprocess, the network size n is estimated using a Maxi-mum Likelihood (ML) method. The ML estimate can becomputed by solving Equation 3, where Cl correspondsto the total number of samples until one is repeated ltimes, using a standard bisection search. Alternatively,the result can be approximated within

√n of the ML-

estimator by Equation 4 (asymptotically unbiased esti-mator), which is computationally more efficient.

Cl−l−1∑i=0

i

n− 1− l = 0 (3)

n = C2l /2l (4)

The accuracy of the produced result is determined bythe parameter l, and its fidelity depends on the ability ofthe sampling method at providing uniformly distributedrandom samples (T must be sufficiently large).

4.5.3 Capture-RecaptureMane et al. [28] proposed an approach based on thecapture-recapture statistical method to estimate the sizeof closed P2P networks (i.e., networks of fixed size,with no peers joining or leaving during the process).

This method requires two or more independent ran-dom samples from the analyzed population, and furthercounting of the number of repeated individuals thatappear in each sample. The authors use random walks toobtain independent random samples. Considering a two-sample strategy, two random walks are performed froma source node, one in each sampling phase (capture andrecapture). In more detail, each random walk proceeds inthe following way: the source node sends a message to arandomly selected neighbor, which at his turn forwardsthe message to another randomly chosen neighbor; theprocess is repeated until a predefined maximum numberof hops is reached (parameter: time-to-live) or the mes-sage gets back to a node that has already participatedin the current random walk. During this process, theinformation about the traversed path (i.e., the UIDs of allparticipating nodes) is kept in the forwarded message.When one of the random walks stopping criteria is met,the message is sent back to the source node with the listof the “captured” nodes, following the reverse traversedpath (stored in the message). The information receivedat the source node from the sampling steps is used tocompute the estimate n of the network size, applyingEquation 5 (where n1 is the number of nodes caughtin the first sample, n2 is the number of nodes caughtin the second sample, and n12 represent the number ofrecaptured nodes, i.e. caught in both samples).

n =((n1 + 1)× (n2 + 1))

(n12 + 1)(5)

4.5.4 Hop-SamplingOne of the approaches proposed by Kostoulas et al. [37],[38] to estimate the size of dynamic groups is based onsampling the receipt times (hop counts) of some nodesfrom an initiator. Receipt times are obtained across thegroup from a gossip propagation started by a singlenode, the initiator, that will further sample the resultinghop counts of some nodes to produce an estimate ofthe group size. In more detail, the protocol proceeds asfollowing: the initiator starts the process by sending aninitiating message (to itself); upon receiving the initiatingmessage nodes start participating in the protocol, peri-odically forwarding it to a number (gossipTo) of othertargets, until a predefined number of rounds (gossipFor)is exceeded, or a maximum quantity of messages (gos-sipUntil) have been received; gossip targets are chosenuniformly at random from the available membership, ex-cluding nodes in a locally maintained list (fromList) fromwhich a message has already been received; exchangedmessages carry the distance to the initiator node, whichis measured in number of hops; each node keeps thereceived minimum number of hops (MyHopCount), andsends the current value incremented by one.

After concluding the described gossip process, waitingfor a predefined number of rounds (gossipResult), theinitiator samples the number of hops (MyHopCount)from some nodes that are selected uniformly at random.

Page 23: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

23

The average of the sampled hop counts is then used toestimate the logarithm of the size of the group (log(n)).In alternatively to the previous sampling process, wherenodes wait for the initiator sample request, nodes candecide themselves to send their hop count value back tothe initiator node, according to a predefined probabilityto allow only a reduced fraction of nodes to respond.

4.5.5 Interval DensityA second approach to estimate the size of a dynamicgroup has been proposed in [37], [38]. This algorithmmeasures the density of the process identifier space,determining the number of unique identifiers within asubinterval of this space. The initiator node passivelycollects information about existing identifiers, snoopingthe information of complementary protocols running onthe network. The node identifiers are randomized byapplying a hash function to each one, and mapped to apoint in the real interval [0, 1]. The initiator estimates thegroup size by determining the number of sampled iden-tifiers X lying in a subinterval I of [0, 1], returning X/I .Notice that, this kind of approach assumes a uniformlyrandom distribution of the identifiers, or use strategiesto reduce the existing correlation between them, in orderto avoid biased estimations.

4.5.6 Other ApproachesSome counting approaches that are based on a central-ized probabilistic polling to collect samples were pre-viously described in this work (in section 3.3), namely:randomized reports that illustrate the basic idea of proba-bilistic polling, and another approach [58] that samplesthe number of message successfully sent in a single-hopwireless network (further improved in [59]).

Other probabilistic polling algorithms are also avail-able in the specific context of multicast groups, to esti-mate their membership size. For example, in [104] someolder mechanisms were analyzed and extended, andin [105] an algorithm using an estimator based on theKalman filter theory was proposed to estimate the sizeof dynamic multicast groups.

5 SUMMARY AND PRACTICAL GUIDELINES

Here, we summarize the properties of the main classes ofalgorithms, stating their advantages and disadvantages(see Table 5), and give some guidelines about their usein specific settings.

Hierarchy-based approaches (see 3.1 and 4.1) requirea specific routing structure (e.g., a spanning tree) tooperate, and thus are limited by the ability of suchstructure to cope with churn and link failures. How-ever, this kind of approach is very cheap in terms ofexchanged messages, requiring only 2N − 1 messages7

to compute the correct average at the sink, i.e. two

7. This is for scenarios where radio broadcast is used to transmitdata, such as in WSN. Otherwise, approximately d(2N − 1) messagesare required, where d is the average degree.

messages for each N nodes (except for the sink), oneto broadcast the aggregation request to its child nodesand another to send the result to its parent. In terms oftime, the aggregation process takes 2h rounds (at most2D, with D representing the network diameter), whereh is the height of the routing hierarchy, i.e. h rounds tospread the aggregation request and another h roundsto aggregate the results from child nodes to parents.This kind of technique is commonly used in energy-constrained environments (i.e., WSN), taking advantageof the reduced messages exchanges. Therefore, we wouldonly recommend the application of such aggregationschemes to fault free scenarios, which is often not thecase. This kind of approach can be significantly affectedby the occurrence of a single failure, losing all the subtreedata and greatly impacting the result produced at thesink. For this reason, in scenarios where faults mightoccur and without regarding energy efficiency, sketchtechniques should be preferred, at least providing somepath redundancy to reach the sink (at the cost of a kfactor increase in terms of messages, with k representingthe number of alternative routing paths).

Sketch based approaches (see 4.3) can be appliedindependently from the underlying routing topology,and thus their use is adequate in fault prone scenarioswhere only a fair approximation of the aggregate isrequired. This kind of technique is fast, the closest tothe theoretical minimum, requiring only D rounds forall nodes to obtain the estimation result, and achievingthis at a total cost of dND messages (i.e., each N nodessend at most one message to each d neighbors at eachround)8. This kind of approach is adequate for faultyscenarios, especially if one privileges obtaining a fastestimate rather than a precise one. Nonetheless, if aprecise estimation is required in a faulty environment,another type of aggregation approach should be chosen,namely an averaging technique.

Averaging algorithms (see 3.5 and 4.2) work indepen-dently from the routing topology, and have the particu-larity of producing results that converge over time to thecorrect value, being able to output results at all nodeswith high accuracies even in faulty environments. Theexecution time of this kind of algorithms depends onthe target accuracy, converging exponentially with linearrounds (at an approximately constant convergence factorbetween each round), with all nodes sending from oneto d message at each round. This kind of approach isslower and consequently requires more messages (al-though smaller) than sketches, but can exhibit betterproperties in terms of fault-tolerance, especially to copewith churn. Nevertheless, one should be very carefulwhen choosing the appropriate averaging approach touse, as in practice many exhibit dependability issues(not converging to the correct value) and very few areeffectively able to operate on dynamic networks.

8. In the case of WSN, with radio broadcast the total message costis reduced to ND.

Page 24: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

24

TABLE 5Summary of the characteristics of main data aggregation classes

Advantage Disadvantage Requirements

Hierarchical- accurate (if no faults occur); - result at a single node; - specific routing structure- optimal message load; - not fault-tolerant; (e.g., spanning tree);

- very fast;- limited accuracy

(by the sketches size);

- local knowledge of neighbor IDs,Sketches - result at all nodes; or global UIDs;

- fault-tolerant; - random number generator;

Averaging

- accurate (converge over time);

- fair message load; - local knowledge of neighbor IDs;- result at all nodes;- fault-tolerant;- support network changes;

- reduced message load(partial network participation);

- not accurateRandomized - result at a single node; - global UIDs;

- not fault-tolerant

Digests- computation of complex

aggregates;- result at all nodes;

- limited accuracy(by the digests size);

- resources needed(e.g., larger messages);

- local knowledge of neighbor IDs;

Randomized (aka sampling) aggregation techniques(see 3.4 and 4.5) do not seem to bring any advan-tages when compared to the other kind of approaches.This kind of approach provides an irregular approxima-tion at a single node, not being accurate and usuallyrestricted to the computation of a single aggregationfunction: COUNT. Furthermore, the random walk basedapproaches are usually slow, taking several rounds toobtain a sample, and are unreliable as the random walktoken might get lost in a faulty environment.

In terms of dynamism, few of the described ap-proaches are tailored to operate on dynamic settings,commonly relying on a restart mechanism (i.e., periodi-cally resetting the computation) to handle network andvalue changes. In fact, in such settings no algorithm isable to provide an exact value of the aggregate at aspecific time instant, as changes can happen during thecomputation (snapshot validity [90]). Hierarchy-basedapproaches are completely dependent on a topologymaintenance and recovery scheme to work on dynamicsettings. The efficiency of such protocol will directlyinfluence the performance of the aggregation algorithm,consuming additional resources to monitor the networkin order to detect changes. Moreover, the topology adap-tation process (parent switching) can cause temporarydisconnections that can still significantly affect the aggre-gation process. Approaches that operate independentlyfrom the network topology, like sketches and averagingschemes, remove the burden of such maintenance proto-col to adapt to churn. Sketch based approaches are “fast”but cannot be applied continuously without resetting thecomputation, as in the case of nodes departure (evenwhen announced) it is not trivial (if not impossible) to re-move items from such structures [?]. From the averaging

class, only the few that perform idempotent operationsbetween nodes are able to efficiently and continuously(without restarts) work on dynamic settings, convergingto the new average resulting from the global “mass”change due to the arrival/departure of nodes or changeof the measured input values.

One should notice that most of the existing approachesonly allow the computation of simple aggregation func-tions, such as AVERAGE, SUM and COUNT, or others thatcan be derived from their combination (by executingmultiple instances of the used algorithm). In many cases,this kind of aggregation functions is enough, but inmany other situations the computation of more complexaggregation functions is more useful. A simple examplecan be found considering some load balancing applica-tion that aims to distribute equitably the global load of asystem. In this case, the knowledge of the total or aver-age load does not provide enough information to assessthe distribution of the system load, i.e. determine if someprocessing nodes are overloaded or idle. Even the com-putation of the maximum and minimum is insufficient,although it allows the detection of a gap across the globalload distribution, as it does not provide informationabout the number of processes at each load level. In thissituation, an estimation of the (statistical) load distribu-tion is required to provide the desired information andreveal outlier values. Other examples can be found in thecontext of monitoring applications. For instance, in WSNestimating the distribution of the monitored attribute canbe very useful to distinguish isolated sensor anomaliesfrom the occurrence of a relevant event characterized bya certain amount of abnormal values. Few approachesare available to compute more complex aggregates andable to approximate the statistical distribution of some

Page 25: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

25

attribute (see 4.4). Existing algorithms from this class(i.e., digests) are more resource consuming and lessaccurate than other approaches, so that their applicationshould be carefully evaluated despite their additionalvalue.

6 FINAL REMARKS AND FUTURE DIRECTIONS

This survey was organized around three main contribu-tions. First, it provides a formal definition of the targetaggregation problem, defining different types of aggre-gations functions and their main properties. Second, ataxonomy of the existing algorithms is proposed, fromtwo perspectives: communication and computation, andthe most relevant algorithms are succinctly described.Finally, a summary of the characteristics of the mainapproaches is provided, giving some guidelines abouttheir suitability to different scenarios.

Distributed data aggregation has been an active fieldof research in the last decade, and a huge diverse amountof techniques can be found in the literature. For thisreasons, this survey intends to be an important timesaving instrument, for those that desire to get a quickand comprehensive overview of the state of the arton distributed data aggregation. Moreover, by carefullyhighlighting the strengths and limitations of the morepertinent approaches, this study can provide a usefulassistance to help readers choose which technique toapply in specific settings.

Currently, there is no ideal general solution to thedistributed computation of an aggregation function, allexisting techniques have their drawbacks (some morethan others as depicted by Table 5). Therefore, moreresearch in this field will be expected in the next fewyears. In particular, due to the added value of comput-ing complex aggregates, new algorithms might arise toestimate the statistical distribution of values, as the fewexisting approaches exhibit some limitations in terms ofaccuracy and resource consumption. Additional researchefforts should be made in this field to improve the sup-port to churn, message loss, and continuous estimationof mutable input values.

REFERENCES

[1] G. Manku, “Routing networks for distributed hash tables,” in22nd annual symposium on Principles of Distributed Computing(PODC), 2003.

[2] I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrish-nan, “Chord: A scalable peer-to-peer lookup service for internetapplications,” in conference on Applications, technologies, architec-tures, and protocols for computer communications (SIGCOMM), 2001,pp. 149–160.

[3] A. Ganesh, A. Kermarrec, and L. Massoulie, “Peer-to-peer mem-bership management for gossip-based protocols,” IEEE Transac-tions on Computers, vol. 52, no. 2, pp. 139–149, 2003.

[4] I. Abraham and D. Malkhi, “Probabilistic quorums for dynamicsystems,” Distributed Computing, vol. 18, no. 2, pp. 113–124, 2005.

[5] Z. Bar-Yossef, R. Friedman, and G. Kliot, “RaWMS - RandomWalk Based Lightweight Membership Service for Wireless AdHoc Networks,” ACM Transactions on Computer Systems (TOCS),vol. 26, no. 2, 2008.

[6] R. van Renesse, “The Importance of Aggregation,” in FutureDirections in Distributed Computing, A. Schiper, A. A. Shvartsman,H. Weatherspoon, and B. Y. Zhao, Eds. Springer-Verlag, 2003,pp. 87–92.

[7] E. Fasolo, M. Rossi, J. Widmer, and M. Zorzi, “In-networkaggregation techniques for wireless sensor networks: a survey,”IEEE Wireless Communications, vol. 14, no. 2, pp. 70–87, 2007.

[8] R. Rajagopalan and P. Varshney, “Data-aggregation techniquesin sensor networks: a survey,” IEEE Communications Surveys &Tutorials, vol. 8, no. 4, pp. 48–63, 2005.

[9] Y. Sang, H. Shen, Y. Inoguchi, Y. Tan, and N. Xiong, “SecureData Aggregation in Wireless Sensor Networks: A Survey,” in7th International Conference on Parallel and Distributed Computing,Applications and Technologies (PDCAT), 2006, pp. 315–320.

[10] H. Alzaid, E. Foo, and J. Nieto, “Secure data aggregation inwireless sensor network: a survey,” in 6th Australasian conferenceon Information security (AISC), 2008, pp. 93–105.

[11] E. Nakamura, A. Loureiro, and A. Frery, “Information fusion forwireless sensor networks: Methods, models, and classifications,”ACM Computing Surveys (CSUR), vol. 39, no. 3, pp. 1–55, 2007.

[12] A. Syropoulos, “Mathematics of multisets,” in Workshop on Multi-set Processing: Multiset Processing, Mathematical, Computer Science,and Molecular Computing Points of View, 2001, pp. 347–358.

[13] B. Pittel, “On Spreading a Rumor,” SIAM Journal on AppliedMathematics, vol. 47, no. 1, pp. 213–223, 1987.

[14] S. Madden, M. Franklin, J. Hellerstein, and W. Hong, “TAG: aTiny AGgregation service for ad-hoc sensor networks,” ACMSIGOPS Operating Systems Review, vol. 36, no. SI, pp. 131–146,2002.

[15] S. Motegi, K. Yoshihara, and H. Horiuchi, “DAG based In-Network Aggregation for Sensor Network Monitoring,” in In-ternational Symposium on Applications and the Internet (SAINT),2005, p. 8.

[16] Y. Birk, I. Keidar, L. Liss, A. Schuster, and R. Wolff, “VeracityRadius: Capturing the Locality of Distributed Computations,” in25th annual ACM symposium on Principles of Distributed Computing(PODC), 2006.

[17] J. Considine, F. Li, G. Kollios, and J. Byers, “Approximate ag-gregation techniques for sensor databases,” in 20th InternationalConference on Data Engineering (ICDE), 2004, pp. 449–460.

[18] Y.-C. Fan and A. Chen, “Efficient and robust sensor data ag-gregation using linear counting sketches,” in IEEE InternationalSymposium on Parallel and Distributed Processing (IPDPS), 2008,pp. 1–12.

[19] Y.-C. Fan and A. L. Chen, “Efficient and Robust Schemes forSensor Data Aggregation Based on Linear Counting,” IEEETransactions on Parallel and Distributed Systems, vol. 21, no. 11,pp. 1675–1691, 2010.

[20] A. Manjhi, S. Nath, and P. Gibbons, “Tributaries and Deltas:Efficient and Robust Aggregation in Sensor Network Streams,”in ACM SIGMOD International Conference on Management Of Data,2005, pp. 287–298.

[21] N. Shrivastava, C. Buragohain, D. Agrawal, and S. Suri, “Me-dians and beyond: new aggregation techniques for sensor net-works,” in 2nd international conference on Embedded networkedsensor systems (SenSys), 2004, pp. 239–249.

[22] K. Horowitz and D. Malkhi, “Estimating network size from localinformation,” Information Processing Letters, vol. 88, no. 5, pp.237–243, 2003.

[23] M. Bawa, H. Garcia-Molina, A. Gionis, and R. Motwani, “Esti-mating aggregates on a peer-to-peer network,” Stanford Univer-sity, Computer Science Department, Tech. Rep., 2003.

[24] L. Massoulie, E. Merrer, A.-M. Kermarrec, and A. Ganesh, “PeerCounting and Sampling in Overlay Networks: Random WalkMethods,” in 25th annual ACM symposium on Principles of Dis-tributed Computing (PODC), 2006.

[25] S. Dolev, E. Schiller, and J. Welch, “Random Walk for Self-Stabilizing Group Communication in Ad-Hoc Networks,” in 21stIEEE Symposium on Reliable Distributed Systems, 2002, pp. 70–79.

[26] ——, “Random Walk for Self-Stabilizing Group Communicationin Ad Hoc Networks,” IEEE Transactions on Mobile Computing,vol. 5, no. 7, pp. 893–905, 2006.

[27] A. Ganesh, A. Kermarrec, E. Le Merrer, and L. Massoulie, “Peercounting and sampling in overlay networks based on randomwalks,” Distributed Computing, vol. 20, no. 4, pp. 267–278, 2007.

[28] S. Mane, S. Mopuru, K. Mehra, and J. Srivastava, “NetworkSize Estimation In A Peer-to-Peer Network,” Department of

Page 26: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

26

Computer Science - University of Minnesota, Tech. Rep. TR 05-030, 2005.

[29] D. Kempe, A. Dobra, and J. Gehrke, “Gossip-Based Computationof Aggregate Information,” in 44th Annual IEEE Symposium onFoundations of Computer Science, 2003, pp. 482–491.

[30] M. Jelasity, A. Montresor, and O. Babaoglu, “Gossip-based ag-gregation in large dynamic networks,” ACM Transactions onComputer Systems (TOCS), vol. 23, no. 3, pp. 219–252, 2005.

[31] J.-Y. Chen, G. Pandurangan, and D. Xu, “Robust Computation ofAggregates in Wireless Sensor Networks: Distributed Random-ized Algorithms and Analysis,” IEEE Transactions on Parallel andDistributed Systems, vol. 17, no. 9, pp. 987–1000, 2006.

[32] P. Jesus, C. Baquero, and P. S. Almeida, “Fault-Tolerant Aggre-gation by Flow Updating,” in 9th IFIP International Conferenceon Distributed Applications and interoperable Systems (DAIS), ser.Springer LNCS, vol. 5523, 2009, pp. 73–86.

[33] ——, “Fault-Tolerant Aggregation for Dynamic Networks,” in29th IEEE Symposium on Reliable Distributed Systems, 2010, pp.37–43.

[34] C. Baquero, P. Almeida, and R. Menezes, “Fast Estimation ofAggregates in Unstructured Networks,” in 5th International Con-ference on Autonomic and Autonomous Systems (ICAS), 2009, pp.88–93.

[35] M. Haridasan and R. van Renesse, “Gossip-based distributionestimation in peer-to-peer networks,” in International workshopon Peer-To-Peer Systems (IPTPS), 2008.

[36] J. Sacha, J. Napper, C. Stratan, and G. Pierre, “Adam2: ReliableDistribution Estimation in Decentralised Environments,” in 30thInternational Conference on Distributed Computing Systems (ICDCS).IEEE, 2010, pp. 697–707.

[37] D. Kostoulas, D. Psaltoulis, I. Gupta, K. Birman, and A. Demers,“Decentralized Schemes for Size Estimation in Large and Dy-namic Groups,” in 4th IEEE International Symposium on NetworkComputing and Applications, 2005, pp. 41–48.

[38] D. Kostoulas, D. Psaltoulis, I. Gupta, K. P. Birman, and A. J. De-mers, “Active and passive techniques for group size estimationin large-scale and dynamic distributed systems,” Elsevier Journalof Systems and Software, vol. 80, no. 10, pp. 1639–1658, 2007.

[39] L. Chitnis, A. Dobra, and S. Ranka, “Aggregation Methodsfor Large-Scale Sensor Networks,” ACM Transactions on SensorNetworks (TOSN), vol. 4, no. 2, pp. 1–36, 2008.

[40] S. Madden, R. Szewczyk, M. Franklin, and D. Culler, “Support-ing Aggregate Queries Over Ad-Hoc Wireless Sensor Networks,”in 4th IEEE Workshop on Mobile Computing Systems and Applica-tions, 2002, pp. 49–58.

[41] P. Flajolet and G. Martin, “Probabilistic counting algorithms fordata base applications,” Journal of Computer and System Sciences,vol. 31, no. 2, pp. 182–209, 1985.

[42] Y. Birk, I. Keidar, L. Liss, and A. Schuster, “Efficient DynamicAggregation,” in 20th International Symposium on DIStributedComputing (DISC), 2006, pp. 90–104.

[43] S. Nath, P. Gibbons, S. Seshan, and Z. Anderson, “Synopsisdiffusion for robust aggregation in sensor networks,” in 2ndinternational conference on Embedded networked sensor systems (Sen-Sys), 2004.

[44] J. Li, K. Sollins, and D. Lim, “Implementing Aggregation andBroadcast over Distributed Hash Tables,” ACM SIGCOMM Com-puter Communication Review, vol. 35, no. 1, pp. 81–92, 2005.

[45] J. Zhao, R. Govindan, and D. Estrin, “Computing Aggregates forMonitoring Wireless Sensor Networks,” in 1st IEEE InternationalWorkshop on Sensor Network Protocols and Applications, 2003, pp.139–148.

[46] M. Dam and R. Stadler, “A Generic Protocol for Network StateAggregation,” Radiovetenskap och Kommunikation (RVK), 2005.

[47] L. Jia, G. Noubir, R. Rajaraman, and R. Sundaram, “GIST: Group-Independent Spanning Tree for Data Aggregation in Dense Sen-sor Networks,” in Distributed Computing in Sensor Systems, ser.Lecture Notes in Computer Science, P. Gibbons, T. Abdelzaher,J. Aspnes, and R. Rao, Eds. Springer Berlin / Heidelberg, 2006,vol. 4026, pp. 282–304.

[48] A. Sharaf, J. Beaver, A. Labrinidis, and K. Chrysanthis, “Balanc-ing energy efficiency and quality of aggregate data in sensornetworks,” International Journal on Very Large Data Bases (VLDB),vol. 13, no. 4, pp. 384–403, 2004.

[49] R. Misra and C. Mandal, “Ant-aggregation: ant colony algorithmfor optimal data aggregation in wireless sensor networks,” in

IFIP International Conference on Wireless and Optical Communica-tions Networks, 2006, p. 5.

[50] W. Liao, Y. Kao, and C. Fan, “Data aggregation in wireless sensornetworks using ant colony algorithm,” Journal of Network andComputer Applications, vol. 31, no. 4, pp. 387–401, 2008.

[51] H. Wang and N. Luo, “An Improved Ant-Based Algorithm forData Aggregation in Wireless Sensor Networks,” in InternationalConference on Communications and Mobile Computing (CMC), 2010,pp. 239–243.

[52] H. Luo, J. Luo, Y. Liu, and S. Das, “Adaptive Data Fusion forEnergy Efficient Routing in Wireless Sensor Networks,” IEEETransactions on Computers, vol. 55, no. 10, pp. 1286–1299, 2006.

[53] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan,“Energy-Efficient Communication Protocol for Wireless Mi-crosensor Networks,” in 33rd Annual Hawaii International Con-ference on System Sciences, 2000, p. 10.

[54] H. Luo, Y. Liu, and S. K. Das, “Distributed Algorithm for EnRoute Aggregation Decision in Wireless Sensor Networks,” IEEETransactions on Mobile Computing, vol. 8, no. 1, pp. 1–13, 2009.

[55] W. Heinzelman, A. Chandrakasan, and H. Balakrishnan, “Anapplication-specific protocol architecture for wireless microsen-sor networks,” IEEE Transactions on Wireless Communications,vol. 1, no. 4, pp. 660–670, 2002.

[56] A. Prieto and R. Stadler, “Adaptive Distributed Monitoringwith Accuracy Objectives,” in SIGCOMM Workshop on InternetNetwork Management (INM), 2006, pp. 65–70.

[57] A. Deligiannakis, Y. Kotidis, and N. Roussopoulos, “HierarchicalIn-Network Data Aggregation with Quality Guarantees,” inInternational Conference on Extending Database Technology (EDBT),2004, pp. 658–675.

[58] T. Jurdzinski, M. Kutylowski, and J. Zatopianski, “Energy-Efficient Size Approximation of Radio Networks with No Col-lision Detection,” in 8th Annual International Conference on Com-puting and Combinatorics (COCOON), 2002, pp. 279–289.

[59] J. Kabarowski, M. Kutyłowski, and W. Rutkowski, “AdversaryImmune Size Approximation of Single-Hop Radio Networks,” in3th International Conference on Theory and Applications of Models ofComputation (TAMC), 2006, pp. 148–158.

[60] A.-M. Kermarrec and M. Steen, “Gossiping in distributed sys-tems,” ACM SIGOPS Operating Systems Review, vol. 41, no. 5,pp. 2–7, 2007.

[61] P. Jesus, C. Baquero, and P. S. Almeida, “Dependability inAggregation by Averaging,” in Simposio de Informatica (INForum),2009.

[62] M. Jelasity and A. Montresor, “Epidemic-style proactive aggre-gation in large overlay networks,” in 24th International Conferenceon Distributed Computing Systems, 2004, pp. 102–109.

[63] F. Wuhib, M. Dam, R. Stadler, and A. Clemm, “Robust Moni-toring of Network-wide Aggregates through Gossiping,” in 10thIFIP/IEEE International Symposium on Integrated Network Manage-ment, 2007, pp. 226–235.

[64] M. Jelasity, W. Kowalczyk, and M. van Steen, “An Approachto Massively Distributed Aggregate Computing on Peer-to-PeerNetworks,” in 12th Euromicro Conference on Parallel, Distributedand Network-Based Processing, 2004, pp. 200–207.

[65] L. Chitnis, A. Dobra, and S. Ranka, “Fault tolerant aggregationin heterogeneous sensor networks,” Journal of Parallel and Dis-tributed Computing, vol. 69, no. 2, pp. 210–219, 2009.

[66] R. van Renesse, K. Birman, and W. Vogels, “Astrolabe: A Robustand Scalable Technology for Distributed System Monitoring,Management, and Data Mining,” ACM Transactions on ComputerSystems (TOCS), vol. 21, no. 2, pp. 164–206, 2003.

[67] I. Gupta, R. V. Renesse, and K. P. Birman, “Scalable fault-tolerantaggregation in large process groups,” in International Conferenceon Dependable Systems and Networks (DSN), 2001, pp. 433–442.

[68] S. Cheng, J. Li, Q. Ren, and L. Yu, “Bernoulli Sampling Based (ε,δ)-Approximate Aggregation in Large-Scale Sensor Networks,”in 29th conference on Information communications (INFOCOM).IEEE Press, 2010, pp. 1181–1189.

[69] R. Olfati-Saber and R. M. Murray, “Consensus Problems in Net-works of Agents With Switching Topology and Time-Delays,”IEEE Transactions on Automatic Control, vol. 49, no. 9, pp. 1520–1533, 2004.

[70] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gos-sip algorithms,” Information Theory, IEEE Transactions on, vol. 52,no. 6, pp. 2508–2530, 2006.

Page 27: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

27

[71] L. Xiao, S. Boyd, and S.-J. Kim, “Distributed average consensuswith least-mean-square deviation,” Journal of Parallel and Dis-tributed Computing, vol. 27, no. 1, pp. 22–46, 2007.

[72] S. Patterson, B. Bamieh, and A. El Abbadi, “Convergence Rates ofDistributed Average Consensus With Stochastic Link Failures,”IEEE Transactions on Automatic Control, vol. 55, no. 4, pp. 880–892,2010.

[73] A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker,H. Sturgis, D. Swinehart, and D. Terry, “Epidemic algorithms forreplicated database maintenance,” in 6th annual ACM Symposiumon Principles of distributed computing (PODC), 1987.

[74] J.-Y. Chen and J. Hu, “Analysis of Distributed Random Groupingfor Aggregate Computation on Wireless Sensor Networks withRandomly Changing Graphs,” IEEE Transactions on Parallel andDistributed Systems, vol. 19, no. 8, pp. 1136–1149, 2008.

[75] P. S. Almeida, C. Baquero, M. Farach-Colton, P. Jesus, and M. A.Mosteiro, “Fault-Tolerant Aggregation: Flow Update Meets MassDistribution,” in 15th International Conference On Principles OfDistributed Systems (OPODIS), ser. Springer LNCS, vol. 7109,2011, pp. 513–527.

[76] S. Kashyap, S. Deb, K. Naidu, R. Rastogi, and A. Srinivasan,“Gossip-Based Aggregate Computation with Low Communica-tion Overhead,” in 12th International Telecommunications NetworkStrategy and Planning Symposium, 2006, pp. 1–6.

[77] I. Eyal, I. Keidar, and R. Rom, “LiMoSense – Live Monitoringin Dynamic Sensor Networks,” in 7th International Symposiumon Algorithms for Sensor Systems, Wireless Ad Hoc Networks andAutonomous Mobile Entities (ALGOSENSOR), 2011.

[78] A. Dimakis, A. Sarwate, and M. Wainwright, “Geographic Gos-sip: Efficient Averaging for Sensor Networks,” IEEE Transactionson Signal Processing, vol. 56, no. 3, pp. 1205–1216, 2008.

[79] ——, “Geographic gossip: efficient aggregation for sensor net-works,” in 5th International Conference on Information Processingin Sensor Networks (IPSN), 2006, pp. 69–76.

[80] M. Mehyar, D. Spanos, J. Pongsajapan, S. Low, and R. Murray,“Asynchronous Distributed Averaging on Communication Net-works,” IEEE/ACM Transactions on Networking, vol. 15, no. 3, pp.512–520, 2007.

[81] K.-Y. Whang, B. Vander-Zanden, and H. Taylor, “A linear-time probabilistic counting algorithm for database applications,”ACM Transactions on Database Systems (TODS), vol. 15, no. 2, pp.208–229, 1990.

[82] M. Durand and P. Flajolet, “Loglog Counting of Large Cardinal-ities (Extended Abstract),” in 11th Annual European Symposiumon Algorithms (ESA), 2003, pp. 605–617.

[83] P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier, “HyperLogLog:the analysis of a near-optimal cardinality estimation algorithm,”in International Conference on Analysis of Algorithms (AofA), 2007,pp. 127–146.

[84] E. Cohen, “Size-estimation framework with applications to tran-sitive closure and reachability,” Journal of Computer and SystemSciences, vol. 55, no. 3, pp. 441–453, 1997.

[85] E. Cohen and H. Kaplan, “Summarizing data using bottom-k sketches,” in 26th annual ACM symposium on principles ofdistributed computing (PODC), 2007, pp. 225–234.

[86] Y.-C. Fan and A. L. Chen, “Energy Efficient Schemes forAccuracy-Guaranteed Sensor Data Aggregation Using ScalableCounting,” IEEE Transactions on Knowledge and Data Engineering,vol. 24, no. 8, pp. 1463–1477, 2012.

[87] C. Baquero, P. Almeida, R. Menezes, and P. Jesus, “Extrema prop-agation: Fast distributed estimation of sums and network sizes,”IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 4,pp. 668–675, 2012.

[88] J. C. Cardoso, C. Baquero, and P. S. Almeida, “ProbabilisticEstimation of Network Size and Diameter,” in 4th Latin-AmericanSymposium on Dependable Computing (LADC), 2009, pp. 33–40.

[89] D. Varagnolo, G. Pillonetto, and L. Schenato, “Distributed sta-tistical estimation of the number of nodes in sensor networks,”in Decision and Control (CDC), 2010 49th IEEE Conference on, ser.IEEE, dec 2010, pp. 149–1503.

[90] M. Bawa, A. Gionis, H. Garcia-Molina, and R. Motwani, “Theprice of validity in dynamic networks,” in ACM SIGMOD inter-national conference on Management of data, 2004, pp. 515–526.

[91] N. Ntarmos, P. Triantafillou, and G. Weikum, “Counting atLarge: Efficient Cardinality Estimation in Internet-Scale DataNetworks,” in 22nd International Conference on Data Engineering(ICDE), 2006.

[92] ——, “Distributed hash sketches: Scalable, efficient, and accuratecardinality estimation for distributed multisets,” ACM Transac-tions on Computer Systems (TOCS), vol. 27, no. 1, 2009.

[93] D. Mosk-Aoyama and D. Shah, “Fast Distributed Algorithms forComputing Separable Functions,” IEEE Transactions on Informa-tion Theory, vol. 54, no. 7, pp. 2997–3007, 2008.

[94] ——, “Computing Separable Functions via Gossip,” in 25thannual ACM symposium on Principles of Distributed Computing(PODC), 2006, pp. 113–122.

[95] M. Greenwald and S. Khanna, “Power-conserving computa-tion of order-statistics over sensor networks,” in 23rd ACMSIGMOD-SIGACT-SIGART symposium on Principles of databasesystems (PODS), 2004, pp. 275–285.

[96] I. Eyal, I. Keidar, and R. Rom, “Distributed Clustering for RobustAggregation in Large Networks,” in 5th Workshop on Hot Topicsin System Dependability (HotDep), 2009.

[97] ——, “Distributed data classification in sensor networks,” in Pro-ceeding of the 29th ACM SIGACT-SIGOPS symposium on Principlesof distributed computing (PODC), 2010, pp. 151–160.

[98] M. Borges, P. Jesus, C. Baquero, and P. S. Almeida, “Spectra:Robust Estimation of Distribution Functions in Networks,” in12th IFIP International Conference on Distributed Applications andinteroperable Systems (DAIS), ser. Springer LNCS, vol. 7272, 2012,pp. 96–103.

[99] E. Cohen and H. Kaplan, “Spatially-decaying aggregation over anetwork: model and algorithms,” in ACM SIGMOD internationalconference on management of data, 2004.

[100] A. DasGupta, “The matching, birthday and the strong birthdayproblem: a contemporary review,” Journal of Statistical Planningand Inference, vol. 130, no. 1-2, pp. 377–389, 2005.

[101] C. J. Schwarz and G. A. F. Seber, “Estimating Animal Abundance:Review III,” Statistical Science, vol. 14, no. 4, pp. 427–456, 1999.

[102] D. Dolev, O. Mokryn, and Y. Shavitt, “On multicast trees: struc-ture and size estimation,” in 22nd Annual Joint Conference of theIEEE Computer and Communications Societies (INFOCOM), 2003,pp. 1011–1021.

[103] ——, “On multicast trees: structure and size estimation,”IEEE/ACM Transactions on Networking, vol. 14, no. 3, pp. 557–567, 2006.

[104] T. Friedman and D. Towsley, “Multicast session membership sizeestimation,” in 18th Annual Joint Conference of the IEEE Computerand Communications Societies (INFOCOM), 1999, pp. 965–972.

[105] S. Alouf, E. Altman, and P. Nain, “Optimal on-line estimationof the size of a dynamic multicast group,” in 21st Annual JointConference of the IEEE Computer and Communications Societies(INFOCOM), 2002, pp. 1109–1118.

Paulo Jesus is currently a Software Developerat Oracle in the MySQL Utilities team. He re-ceived the BEng degree in systems and infor-matics in 2001, and the MSc degree in mobilesystems in 2007, both from the University ofMinho (Portugal). He obtained his PhD degreein 2012, from the MAP-i doctoral program incomputer science by the Universities of Minho,Aveiro and Porto (Portugal). His research inter-ests include distributed algorithms, fault toler-ance, and mobile systems.

Carlos Baquero is currently Assistant Professorat the Computer Science Department in Uni-versidade do Minho (Portugal). He obtained hisMSc and PhD degrees from Universidade doMinho in 1994 and 2000. His research interestsare focused on distributed systems, in particularin causality tracking, peer-to-peer systems anddistributed data aggregation. Recent research isfocused on highly dynamic distributed systems,both in internet P2P settings and in mobile andsensor networks.

Page 28: A Survey of Distributed Data Aggregation Algorithms · Data aggregation is an essential building block of mod-ern distributed systems, enabling the determination of important system

28

Paulo Sergio Almeida is currently AssistantProfessor at the Computer Science Departmentin Universidade do Minho (Portugal). He ob-tained his MSc in Electrical Engineering andComputing from Universidade do Porto in 1994and his PhD degree in Computer Science fromImperial College London in 1998. His researchinterests are focused on distributed systems, inparticular distributed algorithms (namely aggre-gation) and logical clocks for causality tracking(with applications to optimistic replication).