corona: a high performance publish-subscribe system for ... a high performance publish-subscribe...

14
Corona: A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨ un Sirer Cornell University, Ithaca, NY 14853 {ramasv,ryanp,egs}@cs.cornell.edu Abstract Despite the abundance of frequently changing informa- tion, the Web lacks a publish-subscribe interface for de- livering updates to clients. The use of na¨ ıve polling for detecting updates leads to poor performance and lim- ited scalability as clients do not detect updates quickly and servers face high loads imposed by active polling. This paper describes a novel publish-subscribe system for the Web called Corona, which provides high perfor- mance and scalability through optimal resource alloca- tion. Users register interest in Web pages through exist- ing instant messaging services. Corona monitors the sub- scribed Web pages, detects updates efficiently by allocat- ing polling load among cooperating peers, and dissemi- nates updates quickly to users. Allocation of resources for polling is driven by a distributed optimization engine that achieves the best update performance without ex- ceeding load limits on content servers. Large-scale sim- ulations and measurements from PlanetLab deployment demonstrate that Corona achieves orders of magnitude improvement in update performance at a modest cost. 1 Introduction Even though Web content changes rapidly, existing Web protocols do not provide a mechanism for automatically notifying users of updates. The growing popularity of frequently updated content, such as Weblogs, collabora- tively authored Web pages (wikis), and news sites, mo- tivates a publish-subscribe mechanism that can deliver updates to users quickly and efficiently. This need for asynchronous update notification has led to the emer- gence of micronews syndication tools based on na¨ ıve, re- peated polling. The wide acceptance of such micronews syndication tools indicates that backwards compatibility with existing Web tools and protocols is critical for rapid adoption. However, publish-subscribe through uncoordinated polling, similar to the current micronews syndication, suffers from poor performance and scalability. Sub- scribers do not receive updates quickly due to the funda- mental limit posed by the polling period, and are tempted to poll at faster rates in order to detect updates quickly. Consequently, content providers have to handle the high bandwidth load imposed by subscribers, each polling in- dependently and repeatedly for the same content. More- over, such a workload tends to be “sticky;” that is, users subscribed to popular content tend not to unsubscribe af- ter their interest diminishes, causing a large amount of wasted bandwidth. Existing micronews syndication sys- tems provide ad hoc, stop-gap measures to alleviate these problems. Content providers currently impose hard rate- limits based on IP addresses, which render the system inoperable for users sharing an IP address, or they pro- vide hints for when not to poll, which are discretionary and imprecise. The fundamental problem is that an ar- chitecture based on na¨ ıve, uncoordinated polling leads to ineffective use of server bandwidth. This paper describes a novel, decentralized system for detecting and disseminating Web page updates. Our sys- tem, called Corona, provides a high-performance up- date notification service for the Web without requiring any changes to the existing infrastructure, such as Web servers. Corona enables users to subscribe for updates to any existing Web page or micronews feed, and delivers updates asynchronously. The key contribution that en- ables such a general and backwards-compatible system is a distributed, peer-to-peer, cooperative resource man- agement framework that can determine the optimal re- sources to devote to polling data sources in order to meet system-wide goals. The central resource-performance tradeoff in a publish-subscribe system in which publishers serve con- tent only when polled involves bandwidth versus update latency. Clearly, polling data sources more frequently will enable the system to detect and disseminate updates earlier. Yet polling every data source constantly would place a large burden on publishers, congest the network, and potentially run afoul of server-imposed polling lim- its that would ban the system from monitoring the mi- cronews feed or Web page. The goal of Corona, then, is to maximize the effective benefit of the aggregate band- width available to the system while remaining within server-imposed bandwidth limits. Corona resolves the fundamental tradeoff between bandwidth and update latency by expressing it formally NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation USENIX Association 15

Upload: others

Post on 22-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

Corona: A High Performance Publish-Subscribe System for the World Wide Web

Venugopalan Ramasubramanian Ryan Peterson Emin Gun SirerCornell University, Ithaca, NY 14853{ramasv,ryanp,egs}@cs.cornell.edu

AbstractDespite the abundance of frequently changing informa-tion, the Web lacks a publish-subscribe interface for de-livering updates to clients. The use of naıve polling fordetecting updates leads to poor performance and lim-ited scalability as clients do not detect updates quicklyand servers face high loads imposed by active polling.This paper describes a novel publish-subscribe systemfor the Web called Corona, which provides high perfor-mance and scalability through optimal resource alloca-tion. Users register interest in Web pages through exist-ing instant messaging services. Coronamonitors the sub-scribedWeb pages, detects updates efficiently by allocat-ing polling load among cooperating peers, and dissemi-nates updates quickly to users. Allocation of resourcesfor polling is driven by a distributed optimization enginethat achieves the best update performance without ex-ceeding load limits on content servers. Large-scale sim-ulations and measurements from PlanetLab deploymentdemonstrate that Corona achieves orders of magnitudeimprovement in update performance at a modest cost.

1 IntroductionEven though Web content changes rapidly, existing Webprotocols do not provide a mechanism for automaticallynotifying users of updates. The growing popularity offrequently updated content, such as Weblogs, collabora-tively authored Web pages (wikis), and news sites, mo-tivates a publish-subscribe mechanism that can deliverupdates to users quickly and efficiently. This need forasynchronous update notification has led to the emer-gence of micronews syndication tools based on naıve, re-peated polling. The wide acceptance of such micronewssyndication tools indicates that backwards compatibilitywith existing Web tools and protocols is critical for rapidadoption.However, publish-subscribe through uncoordinatedpolling, similar to the current micronews syndication,suffers from poor performance and scalability. Sub-scribers do not receive updates quickly due to the funda-mental limit posed by the polling period, and are temptedto poll at faster rates in order to detect updates quickly.

Consequently, content providers have to handle the highbandwidth load imposed by subscribers, each polling in-dependently and repeatedly for the same content. More-over, such a workload tends to be “sticky;” that is, userssubscribed to popular content tend not to unsubscribe af-ter their interest diminishes, causing a large amount ofwasted bandwidth. Existing micronews syndication sys-tems provide ad hoc, stop-gap measures to alleviate theseproblems. Content providers currently impose hard rate-limits based on IP addresses, which render the systeminoperable for users sharing an IP address, or they pro-vide hints for when not to poll, which are discretionaryand imprecise. The fundamental problem is that an ar-chitecture based on naıve, uncoordinated polling leads toineffective use of server bandwidth.This paper describes a novel, decentralized system fordetecting and disseminating Web page updates. Our sys-tem, called Corona, provides a high-performance up-date notification service for the Web without requiringany changes to the existing infrastructure, such as Webservers. Corona enables users to subscribe for updates toany existing Web page or micronews feed, and deliversupdates asynchronously. The key contribution that en-ables such a general and backwards-compatible systemis a distributed, peer-to-peer, cooperative resource man-agement framework that can determine the optimal re-sources to devote to polling data sources in order to meetsystem-wide goals.The central resource-performance tradeoff in apublish-subscribe system in which publishers serve con-tent only when polled involves bandwidth versus updatelatency. Clearly, polling data sources more frequentlywill enable the system to detect and disseminate updatesearlier. Yet polling every data source constantly wouldplace a large burden on publishers, congest the network,and potentially run afoul of server-imposed polling lim-its that would ban the system from monitoring the mi-cronews feed or Web page. The goal of Corona, then, isto maximize the effective benefit of the aggregate band-width available to the system while remaining withinserver-imposed bandwidth limits.Corona resolves the fundamental tradeoff betweenbandwidth and update latency by expressing it formally

NSDI ’06: 3rd Symposium on Networked Systems Design & ImplementationUSENIX Association 15

Page 2: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

as a mathematical optimization problem. The systemthen computes the optimal way to allocate bandwidth tomonitored Web objects using a decentralized algorithmthat works on top of a distributed peer-to-peer overlay.This allocation takes object popularity, update rate, con-tent size, and internal system overhead stemming fromaccounting and dissemination of meta-information intoaccount, and yields a polling schedule for different ob-jects that will achieve global performance goals, subjectto resource constraints. Corona can optimize the sys-tem for different performance goals and resource limits.In this paper, we examine two relevant goals: how tominimize update latency while ensuring that the averageload on publishers is no more than what it would havebeen without Corona, and how to achieve a targeted up-date latency while minimizing bandwidth consumption.We also examine variants of these two main approacheswhere the load is more fairly balanced across objects.The front-end client interface to Corona is throughexisting instant messaging (IM) services. Users sub-scribe for content by sending instant messages to a reg-istered Corona IM handle, and receive update notifica-tions asynchronously. Internally, Corona consists of acloud of nodes that monitor the set of active feeds orWeb pages called channels. The Corona resource allo-cation algorithm determines the number of nodes desig-nated to monitor each channel. Cooperative polling en-sures that the system can detect updates quickly while nosingle node exceeds server-designated limits on pollingfrequency. Each node dedicated to monitoring a channelhas a copy of the latest version of the channel contents.A feed-specific difference engine determines whether de-tected changes are germane by filtering out superficialdifferences such as timestamps and advertisements, ex-tracts the relevant portions that have changed, and dis-tributes the delta-encoded changes to all internal nodesassigned to monitor the channel, which in turn distributethem to subscribed clients via IM.We have implemented a prototype of Corona and de-ployed it on PlanetLab. Evaluation of this deploymentshows that Corona achieves more than an order of mag-nitude improvement in update performance. In experi-ments parameterized by real RSS workload collected atCornell [19] and spanning 60 PlanetLab nodes and in-volving 150,000 subscriptions for 7500 different chan-nels, Corona clients see fresh updates in intervals of45 seconds on average compared to legacy RSS clients,which see a mean update interval of 15 minutes. Atall times during the experiment, Corona limits the totalpolling load on the content servers within the load im-posed by the legacy RSS clients.Overall, Corona is a new overlay-based publish-subscribe system for theWeb that provides asynchronousnotifications, fast update detection, and optimal band-

width utilization. This paper makes three contributions:(i) it outlines the general design of a publish-subscribesystem that does not require any changes to contentsources, (ii) formalizes the tradeoffs as an optimizationproblem and presents a novel distributed numerical so-lution technique for determining the allocation of band-width that will achieve globally targeted goals while re-specting resource limits, and (iii) presents results fromextensive simulations and a live deployment that demon-strate that the system is practical.The rest of the paper is organized as follows. Thenext section provides background on publish-subscribesystems and discusses other related work. Section 3 de-scribes the architecture of Corona in detail. Implemen-tation details are presented in Section 4 and experimen-tal results based on simulations and deployment are de-scribed in Section 5. Finally, Section 6 summarizes ourcontributions and concludes.

2 Background and Related WorkPublish-subscribe systems have raised considerable in-terest in the research community over the years. In thissection, we provide background on publish-subscribebased content distribution and summarize the currentstate of the art.Publish-Subscribe Systems: The publish-subscribeparadigm consists of three components: publishers, whogenerate and feed the content into the system, sub-scribers, who specify content of their interest, and an in-frastructure for matching subscriber interests with pub-lished content and delivering matched content to thesubscribers. Based on the expressiveness of subscriberinterests, pub-sub systems can be classified as topic-based or content-based. In topic-based systems, pub-lishers and subscribers are connected together by pre-defined topics, called channels; content is published onwell-advertised channels to which users subscribe to re-ceive asynchronous updates. Content-based systems en-able subscribers to express elaborate queries on the con-tent and use sophisticated content filtering techniques tomatch subscriber interests with published content.Prior research on pub-sub systems has primarily fo-cused on the design and implementation of content fil-tering and event delivery mechanisms. Topic-basedpublish-subscribe systems have been built based on sev-eral decentralized mechanisms, such as group commu-nication in Isis [13], shared object spaces in Linda [5],TSpace [36], and Java Spaces [16] and rendezvous pointsin TIBCO [35] and Herald [4]. Content-based publish-subscribe systems that use in-network content filter-ing and aggregation include SIENA [6], Gryphon [34],Elvin [32], and Astrolabe [37]. While the abovepublish-subscribe systems impose well-defined struc-

NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation USENIX Association16

Page 3: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

tures for the content, few systems have been proposedfor semi-structured and unstructured content. YFilter [8],Quark [3], XTrie [7], and XTreeNet [11] are recent archi-tectures for supporting complex content-based querieson semi-structured XML data. Conquer [21] and We-bCQ [20] support unstructured Web content.The fundamental drawback of the preceding publish-subscribe systems is their non-compatibility with the cur-rent Web architecture. They require substantial changesin the way publishers serve content, expect subscribers tolearn sophisticated query languages, or propose to lay-out middle-boxes in the core of the Internet. On theother hand, Corona interoperates with the current pull-based Web architecture, requires no changes to legacyWeb servers, and provides an easy-to-use IM based in-terface to the users. Optimal resource management inCorona aimed at bounding network load insulates Webservers from high load during flash-crowds.Micronews Systems: Micronews feeds are shortdescriptions of frequently updated information, such asnews stories and blog updates, in XML-based formatssuch as RSS [30] and Atom [1]. They are accessed viaHTTP through URLs and supported by client applica-tions and browser plug-ins called feed readers, whichcheck the contents of micronews feeds periodically andautomatically on the user’s behalf and display the re-turned results. The micronews standards envision apublish-subscribe model of content dissemination anddefine XML tags such as cloud that tell clients how to re-ceive asynchronous updates, as well as TTL, SkipHours,and SkipDays that inform clients when not to poll. Yetfew content providers currently use the cloud tag to de-liver asynchronous updates.Recently, commercial services such as Bloglines,NewsGator, and Queoo have started disseminating mi-cronews updates to users. Corona differs fundamentallyfrom these commercial services, which use fragile cen-tralized servers and relentless polling to detect updates.Corona is layered on a self-organizing overlay comprisedof cooperative peers that share updates, judiciously de-termine the amount of bandwidth consumed by polling,and can provide strong bandwidth guarantees.FeedTree [31] is a recently proposed system for dis-seminating micronews that also uses a structured over-lay and shares updates between peers. FeedTree nodesperform cooperative update detection in order to reduceupdate dissemination latencies, and Corona shares the in-sight with FeedTree that cooperative polling can drasti-cally reduce update latencies. FeedTree decides on thenumber of nodes to dedicate to polling each channelbased on heuristics. Corona’s key contribution is the useof informed tradeoffs to optimal resource management.This principled approach enables Corona to provide thebest update performance for its users, while ensuring that

content servers are lightly loaded and do not get over-whelmed due to flash-crowds or sticky traffic.CAM [26] and WIC [25] are two techniques for al-locating bandwidth for polling data sources on a singlenode. Similar to Corona, they perform resource alloca-tion using analytical models for the tradeoff and numeri-cal techniques to find near-optimal allocations. However,these techniques are limited to a single node. Corona per-forms resource allocation in a decentralized, cooperativeenvironment and targets globally optimal update perfor-mance.Overlay Networks: Corona is layered on structuredoverlays and leverages the underlying structure to facil-itate optimal resource management. Recent years haveseen a large number of structured overlays that orga-nize the network based on rings [33, 29, 39, 23], hyper-dimensional cubes [28], butterfly structures [22], de-Bruijn graphs [17, 38], or skip-lists [14]. Corona is ag-nostic about the choice of the overlay and can be easilylayered on any overlaywith uniform node degree, includ-ing the ones listed here.Corona’s approach to a peer-to-peer resource man-agement problem has a similar flavor to that of Bee-hive [27], a structured replication framework that re-solves space-time tradeoffs optimizations in structuredoverlays. Corona differs fundamentally from Beehive inthree ways. First, the Beehive problem domain is lim-ited to object replication in systems where objects havehomogeneous popularity, size, and update rate proper-ties, whereas Corona is designed for the Web environ-ment where such properties can vary by several orders ofmagnitude between objects [10, 19]. Thus, Corona takesobject characteristics into account during optimization.Second, the more complex optimization problem ren-ders the Beehive solution technique, based on mathemat-ical derivation, fundamentally unsuitable for the problemtackled by Corona. Hence, Corona employs a more gen-eral and sophisticated numerical algorithm to perform itsoptimizations. Finally, the resource-performance trade-offs that arise in Corona are fundamentally different fromthe tradeoffs that Beehive addresses.

3 CoronaCorona (Cornell Online News Aggregator) is a topic-based publish-subscribe system for the Web. It pro-vides asynchronous update notifications to clients whileinteroperating with the current pull-based architectureof the Web. URLs of Web content serve as topics orchannels in Corona; users register their interest in someWeb content by providing its URL and receive updatesasynchronously about changes posted to that URL. AnyWeb object identifiable by a URL can be monitored withCorona. In the background, Corona checks for updates

NSDI ’06: 3rd Symposium on Networked Systems Design & ImplementationUSENIX Association 17

Page 4: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

�������������

���������

����

��

��

��� ������

��� ����������

����

Figure 1: Corona Architecture: Corona is a distributedpublish-subscribe system for the Web. It detects Web up-dates by polling cooperatively and notifies clients throughinstant messaging.

on registered channels by cooperatively polling the con-tent servers from geographically distributed nodes.We envision Corona as an infrastructure service of-fered by a set of widely distributed nodes. These nodesmay be all part of the same administrative domain, suchas Akamai, or consist of server-class nodes contributedby participating institutions. By participating in Corona,institutions can significantly reduce the network band-width consumed by frequent redundant polling for con-tent updates, as well as reduce the peak loads seen atcontent providers that they themselves may host. Coronanodes self-organize to form a structured overlay sys-tem. We use structured overlays to organize the system,as they provide decentralization, good failure resilience,and high scalability [33, 29, 39, 28, 9, 14, 17, 23, 24, 38].Figure 1 illustrates the overall architecture of Corona.The key feature that enables Corona to achieve fastupdate detection is cooperative polling. Corona assignsmultiple nodes to periodically poll the same channel andshares updates detected by any polling node. In gen-eral, n nodes polling with the same polling interval andrandomly distributed polling times can detect updates ntimes faster if they share updates with each other. Whileit is tempting to take the maximum advantage of coop-erative polling by having every Corona node poll for ev-ery feed, such a naıve approach is clearly unscalable andwould impose substantial network load on both Coronaand content servers.Corona makes informed decisions on distributingpolling tasks among nodes. The number of nodes thatpoll for each channel is determined based on an analy-sis of the fundamental tradeoff between update perfor-mance and network load. Corona poses this tradeoff asan optimization problem and obtains the optimal solu-tion using Honeycomb, a light-weight toolkit for com-puting optimal performance-overhead tradeoffs in struc-tured distributed systems. This principled approach en-

����� �

����� �

�� �����

�� �����

Figure 2: Cooperative Polling in Corona: Each channel isassigned a wedge of nodes to poll the content servers and de-tect updates. Corona determines the optimal wedge size foreach channel through analysis of the global performance-overhead tradeoff.

ables Corona to efficiently resolve the tradeoff betweenperformance and scalability.In this section, we provide detailed descriptions of thecomponents of Corona’s architecture, including the an-alytical models, optimization framework, update detec-tion and notification mechanisms, and user interface.

3.1 Analytical ModelingOur analysis-driven approach can be easily applied onany distributed system organized as a structured overlaywith uniform node degree. In this paper, we describeCorona using Pastry [29] as the underlying substrate.Pastry organizes the network into a ring by assigningidentifiers from a circular numeric space to each node.The identifiers are treated as a sequence of digits of baseb. In addition to neighbors along the ring, each nodemaintains contact with nodes that have matching prefixdigits. These long-distance contacts are represented in atabular structure called a routing table. The entry in theith row and jth column of a node’s routing table points toa node whose identifier shares i prefix digits and whose(i + 1)th digit is j. Essentially, the routing table definesa directed acyclic graph (DAG) rooted at each node, en-abling a node to reach any other node in logb N hops.Corona assigns nodes in well-defined wedges of thePastry ring for polling each channel. Each channel isassigned an identifier from the same circular numericspace. A wedge is as set of nodes sharing a commonnumber of prefix digits with a channel’s identifier. Achannel with polling level l is polled by all nodes withat least l matching prefix digits in their identifiers. Thus,polling level 0 indicates that all the nodes in the systempoll for the channel. Figure 2 illustrates the concept ofpolling levels in Corona.Assigning well-defined portions of the ring to eachchannel enables Corona to manage polling efficientlywith little overhead. The set of nodes polling for achannel can be represented by just a single number, the

NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation USENIX Association18

Page 5: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

polling level, eliminating the expensiveO(n) complexityfor managing state about cooperating nodes. Moreover,this also facilitates efficient update sharing, as a wedgeis a subset of the DAG rooted at each node, and all thenodes in a wedge can be reached quickly using the con-tacts in the routing table.The polling level of a channel quantifies itsperformance-overhead tradeoff. A channel at level l has,on average, N

bl nodes polling it, which can cooperativelydetect updates in τ

2bl

Ntime on average, where τ is the

polling interval. We estimate the average update detec-tion time at a single node polling periodically at an inter-val τ to be τ

2. Simultaneously, the collective load placed

on the content server of the channel is τ Nbl . Note that

we do not include the propagation delay for sharing up-dates in this analysis because updates can be detected bycomparing against any old version of the content. Hence,even if an update detected at a different node in the sys-tem is received late, the time to detect the next update atthe current node does not change.An easy way to set polling levels is to independentlychoose a level for each channel based on these esti-mates. However, such an approach involves investigatingheuristics for determining the appropriate performancerequirement for each channel and for dividing the totalload between different channels. Moreover, it does notprovide fine-grained control over the performance of thesystem, often causing it to operate far from optimally.The rest of this section describes how the tradeoffs can beposed as mathematical optimization problems to achievedifferent performance requirements.Corona-Lite: The first performance goal we ex-plore is minimizing the average update detection timewhile bounding the total network load placed on contentservers. Corona-Lite improves the update performanceseen by the clients while ensuring that the content servershandle a light load: no more than they would handle fromthe clients if the clients fetched the objects directly fromthe servers.The optimization problem for Corona-Lite is definedin Table 1. The overall update performance is measuredby taking an average of the update detection time of eachchannel weighted by the number of clients subscribed tothe channels. We weigh the average using the number ofsubscriptions because update performance is an end userexperience, and each client counts as a separate unit inthe average. The target network load for this optimiza-tion is simply the total number of subscriptions in thesystem.Corona-Lite clients experience the maximum bene-fits of cooperation. Clients of popular channels gaingreater benefits than clients of less popular channels. Yet,Corona-Lite does not suffer from “diminishing returns,”using its surplus polling capacity on less popular chan-

nels where the extra bandwidth yields higher marginalbenefit. Since improvement in update performance is in-versely proportional to the number of polling nodes, anaıve heuristic-based scheme that assigns polling nodesin proportion to number of subscribers would clearly suf-fer from diminishing returns. Corona, on the other hand,distributes the surplus load to other, less popular chan-nels, improving their update detection times and achiev-ing a better global average.

Corona-Lite:min.

∑M

1 qibli

Ns.t.

∑M

1 siN

bli≤

∑M

1 qi

Minimize average update detection time, whilebounding the load placed on content servers.

Corona-Fast:min.

∑M

1 siN

blis.t.

∑M

1 qib

li

N≤ T

∑M

1 qi

Achieve a target average update detection time, whileminimizing the load placed on content servers.

Corona-Fair:min.

∑M

1 qiτui

bli

Ns.t.

∑M

1 siN

bli≤

∑M

1 qi

Minimize average update detection time w.r.t. expectedupdate frequency, bounding load on content servers.

Corona-Fair-Sqrt:min.

∑M

1 qi

√τ√u

i

bli

Ns.t.

∑M

1 siN

bli≤

∑M

1 qi

Corona-Fair with sqrt weight on the latency ratio toemphasize infrequently changing channels.

Corona-Fair-Log:min.

∑M

1 qilog τ

log ui

bli

Ns.t.

∑M

1 siN

bli≤

∑M

1 qi

Corona-Fair with log weight on the latency ratio toemphasize infrequently changing channels.

Notationτ polling intervalM number of channelsN number of nodesb base of structured overlayT performance targetli polling level of channel iqi number of clients for channel isi content size for channel iui update interval for channel i

Table 1: Performance-Overhead Tradeoffs: This tablesummarizes the optimization problems for different perfor-mance goals in Corona.

Corona-Fast:While Corona-Lite bounds the networkload on the content servers and minimizes update latency,the update performance it provides can vary dependingon the current workload. Corona-Fast provides stable up-date performance, which can be maintained steadily at a

NSDI ’06: 3rd Symposium on Networked Systems Design & ImplementationUSENIX Association 19

Page 6: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

desired level through changes in the workload. Corona-Fast solves the converse of the previous optimizationproblem; that is, it minimizes the total network load onthe content servers while meeting a target average up-date detection time. Corona-Fast enables us to tune theupdate performance of the system according to applica-tion needs. For example, a stock-tracker application maychoose a target of 30 seconds to quickly detect changesto stock prices.Corona-Fast shields legacy Web servers from suddenincreases in load. A sharp increase in the number of sub-scribers for a channel does not trigger a correspondingincrease in network load on theWeb server since Corona-Fast does not increase polling after diminishing returnssets in. In contrast, in legacy RSS, popularity spikescause a significant increase in network load on contentproviders. Moreover, the increased load typically contin-ues unabated in legacy RSS as subscribers forget to un-subscribe, creating “sticky” traffic. Corona-Fast protectscontent servers from flash-crowds and sticky traffic.Corona-Fair: Corona-Fast and Corona-Lite do notconsider the actual rate of change of content in a channel.While some Web objects are updated every few minutes,others do not change for days at a time [10, 19]. Corona-Fair incorporates the update rate of channels into the per-formance tradeoff in order to achieve a fairer distributionof update performance between channels. It defines amodified update performance metric as the ratio of theupdate detection time and the update interval of the chan-nel, which it minimizes to achieve a target load.While the new metric accounts for the wide differencein update characteristics, it biases the performance unfa-vorably against channels with large update interval times.A channel that does not change for several days experi-ences long update detection times, even if there are manysubscribers for the channel. We compensate for this biasby exploring other update performance metrics based onsquare root and logarithmic functions, which grow sub-linearly. A sub-linear metric dampens the tendency of theoptimization algorithm to punish slow-changing yet pop-ular feeds. Table 1 summarizes the optimization prob-lems for different versions of Corona.

3.2 Decentralized OptimizationCorona determines the optimal polling levels using theHoneycomb optimization toolkit. Honeycomb providesnumerical algorithms and decentralized mechanisms forsolving optimization problems that can be expressed asfollows:

min.M∑

1

fi(li) s.t.∑M

1 gi(li) ≤ T.

Here, fi(l) and gi(l) can define the performance or thecost for channel i as a function of the polling level l.

The preceding optimization problem is NP-Hard, asthe polling levels only take integral values. Hence, in-stead of using computationally intensive techniques tofind the exact solution, Honeycomb finds an approxi-mate solution quickly in time comparable to a sortingalgorithm. Honeycomb’s optimization algorithm runs inO(M log M log N) time.The solution provided by Honeycomb is accurate anddeviates from the optimal in at most one channel. Honey-comb achieves this accuracy by finding two solutions thatoptimize the problem with slightly altered constraints:one with a constraint Td ≤ T and another with con-straint Tu ≥ T . The corresponding solutions L∗

u and L∗d

are exactly optimal for the optimization problems withconstraints Tu and Td respectively, and differ in at mostone channel. That is, one channel has a different pollinglevel in L∗

u than in L∗d. Note that the optimal solution L∗

for the original problem with constraint T may actuallydecide to allocate channels differently from L∗

d and L∗u.

Yet, the minimum determined by L∗ will be bounded bythe minima determined by L∗

d and L∗u due to monotonic-

ity. Honeycomb then chooses L∗d as the final solution

because it satisfies the constraint T strictly.Honeycomb computes L∗

d and L∗u using a Lagrange

multiplier to transform the optimization problem as fol-lows:

L∗ = arg min.M∑

1

fi(li) − λ[M∑

1

gi(li) − T ].

Honeycomb iterates over λ and obtains the two solu-tionsL∗

d andL∗u that bracket the minimum using standard

bracketing methods for function optimization in one di-mension.Two observations enable Honeycomb to speed up theoptimization algorithm. First, L∗(λ′) for a single itera-tion can be computed by finding arg min.fi(li)−λ′gi(li)independently for each channel. This takes O(M log N)time as the number of levels is bounded by dlog Ne.Second, for each channel there are only log N val-ues of λ that change arg min.fi(li) − λ′gi(li). Pre-computing these λ values for each object provides a dis-crete iteration space of M log N λ values. By keep-ing a sorted list of the λ values, Honeycomb computesthe optimal solution in O(log M) iterations. Overall,the run-time complexity of the optimization algorithmis O(M log M log N) time, including the time spent inpre-computation, sorting, and iterations.The preceding algorithm requires the tradeoff func-tions fi(l) and gi(l) of all channels in the system inorder to compute the global optimum. Solving the op-timization problem using limited data available locallycan produce highly inaccurate solutions. On the otherhand, collecting the tradeoff factors for all the channels

NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation USENIX Association20

Page 7: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

at each node is clearly expensive and impractical. It ispossible to gather the tradeoff data at a central node, runthe optimization algorithm at a single location, and thendistribute the optimal levels to peers from the central lo-cation. We avoid using a centralized infrastructure as itintroduces a single point of failure in the system and haslimited scalability.Instead, Honeycomb internally aggregates coarsegrained information about global tradeoff factors. Itcombines channels with similar tradeoff factors into atradeoff cluster. Each cluster summarizes the trade-off factors for multiple channels and provides coarse-grained tradeoff information. A ratio of performance andcost factors, fi/gi, is used as a metric to combine chan-nels. For example, channels with comparable values for

qi

uisi

are combined into a cluster in Corona-Fair.Honeycomb nodes periodically exchange the clusterswith contacts in the routing table and aggregate the clus-ters received from the contacts. Honeycomb keeps theoverhead for cluster aggregation low by limiting thenumber of clusters for each polling level to a constantTradeoff Bins. Each node receives Tradeoff Bins clus-ters for every polling level from each contact in the rout-ing table. Combined, these clusters summarize the trade-off characteristics of all the channels in the system. Thecluster aggregation overhead in terms of memory state aswell as network bandwidth is limited by the size of therouting table, and scales with the logarithm of the systemsize.

3.3 System ManagementCorona is a completely decentralized system, in whichnodes act independently, share load, and achieve glob-ally optimal performance through mutual cooperation.Corona spreads load uniformly among the nodes throughconsistent-hashing [18]. Each channel in Corona has aunique identifier and one or more owner nodesmanagingit. The identifier is a content-hash of the channel’s URL,and the primary owner of a channel is the Corona nodewith the numerically closest identifier to the channel’sidentifier. Corona adds additional owners for a channelin order to tolerate failures. These owners are the F clos-est neighbors of the primary owner along the ring. In theevent an owner fails, a new neighbor automatically re-places it.Owners take responsibility for managing subscrip-tions, polling, and updates for a channel. Owners receivesubscriptions through the underlying overlay, which au-tomatically routes all subscription requests of a channelto its owner. The owners keep state about the subscribersof a channel and send notifications to them when freshupdates are detected. In addition, owners also keep trackof channel-specific factors that affect the performancetradeoffs, namely the number of subscribers, the size

of the content, and the interval at which servers updatechannel content. The latter is estimated based on the timebetween updates detected by Corona.Corona manages cooperative polling through a pe-riodic protocol consisting of an optimization phase, amaintenance phase, and an aggregation phase. In theoptimization phase, Corona nodes apply the optimiza-tion algorithm on fine-grained tradeoff data for locallypolled channels and coarse-grained tradeoff clusters ob-tained from overlay contacts. In the maintenance phase,changes to polling levels are communicated to peernodes in the routing table through maintenance mes-sages. Finally, the aggregation phase enables nodes toreceive new aggregates of tradeoff factors. In practice,the three phases occur concurrently at a node with aggre-gation data piggy-backed on maintenance messages.Corona nodes operate independently and make deci-sions to increase or decrease polling levels locally. Ini-tially, only the owner nodes at level K = dlog Ne pollfor the channels. If an owner decides to lower the pollinglevel to K − 1 (based on local optimization), it sends amessage to the contacts in its routing table at rowK−1 inthe next maintenance phase. As a result, a small wedgeof levelK − 1 nodes start polling for that channel. Sub-sequently, each of these nodes may independently decideto further lower the polling level of that channel. Simi-larly, if an owner node decides to raise the level fromK − 1 to K , it asks its contact in the K − 1 wedge tostop polling.In general, when a level i node lowers the level to i−1or raises the level from i − 1 back to i, it instructs itscontact in row i − 1 of its routing table to start or stoppolling for that channel. This control path closely fol-lows the DAG rooted at the owner node. Nodes at level i(depthK − i) in this DAG decide whether their childrenat level i − 1 should poll a channel and convey these de-cisions periodically every maintenance interval. When anode is instructed to begin polling for a channel, it waitsfor a random interval of time between 0 and the pollinginterval before the first poll. This ensures that polls for achannel at different nodes are randomly distributed overtime.Corona nodes gather current estimates of tradeoff fac-tors in the aggregation phase. Owners monitor the num-ber of subscribers and send out fresh estimates alongwith the maintenance message. Subsequent maintenancemessages sent out by descendant nodes in the DAG prop-agate these estimates to all the nodes in the wedge. Theupdate interval and size of a feed only change during up-dates and are therefore sent along with updates. Tradeoffclusters are also sent by contacts in the routing table inresponse to maintenance messages.Corona inherits robustness and failure-resilience fromthe underlying structured overlay. If the current contact

NSDI ’06: 3rd Symposium on Networked Systems Design & ImplementationUSENIX Association 21

Page 8: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

in the routing table fails, the underlying overlay automat-ically replaces it with another contact. When new nodesjoin the system or nodes fail, Corona ensures the transferof subscription state to new owners. A node that is nolonger an owner simply erases its subscription state, anda node that becomes a new owner receives the state fromother owners of the channel. Simultaneous failure ofmore than F adjacent nodes poses a problem for Corona,as well as to many other peer-to-peer systems; we as-sume that F is chosen to make such an occurrence rare.Note that clients can easily renew subscriptions should acatastrophic failure lose some subscription state.Overall, Corona manages polling using light-weightmechanisms that impose a small, predictable overheadon the nodes and network. Its algorithms do not rely onexpensive constructs such as consensus, leader election,or clock synchronization. Networking activity is limitedto contacts in the nodes’ routing tables.

3.4 Update DisseminationUpdates are central to the operation of Corona; hence,we ensure that they are detected and disseminated effi-ciently. Corona uses monotonically increasing numbersto identify versions of content. The version numbers arebased on content modification times whenever the con-tent carries such a timestamp. For other channels, theprimary owner node assigns version numbers in increas-ing order based on the updates it receives.Corona nodes share updates only as deltas, the differ-ences between old and new content, rather than the en-tire content. A measurement study on micronews feedsconducted at Cornell shows that the amount of change incontent during an update is typically tiny. The study re-ports that the average update consists of 17 lines of XML,or 6.8% of the content size [19], which implies that asignificant amount of bandwidth can be saved throughdelta-encoding.A difference engine enables Corona to identify whena channel carries new information that needs to be dis-seminated to subscribed clients. The difference engineparses the HTML or XML content to discover the corecontent in the channel, ignoring frequently changing ele-ments such as timestamps, counters, and advertisements.The difference engine generates a delta if it detects anupdate after isolating the core content. The data in adelta resembles the typical output of the POSIX ‘diff’command: it carries the line numbers where the changeoccurs, the changed content, an indication of whether itis an addition, omission, or replacement, and a versionnumber of the old content to compare against.When a delta is generated by a node, it shares the up-date with all other nodes in the channel’s polling wedge.To achieve this, the node simply disseminates the deltaalong the DAG rooted at it up to a depth equal to the

polling level of the channel. The dissemination along theDAG takes place using contacts in the routing table of theunderlying overlay. For channels that cannot obtain a re-liable modification timestamp from the server, the nodedetecting the update sends the delta to the primary owner,which assigns a new version number and initiates the dis-semination to other nodes polling that channel. Two dif-ferent nodes may detect a change “simultaneously” andsend deltas to the primary owner. The primary owneralways checks the current delta with the latest updatedversion of the content and ignores redundant deltas.

3.5 User InterfaceCorona employs instant messaging (IM) as its user in-terface. Users add Corona as a “buddy” in their favoriteinstant messaging system; both subscriptions and updatenotifications are then communicated as instant messagesbetween the users and Corona. Users send request mes-sages of the form “subscribe url” and “unsubscribe url”to subscribe and unsubscribe for a channel. A subscribeor unsubscribe message delivered by the IM system toCorona is routed to all the owner nodes of the channel,which update their subscription state. When a new up-date is detected by Corona, the current primary ownersends an instant message with the delta to all the sub-scribers through the IM system. If a subscriber is off-line at the time an update is generated, the IM systembuffers the update and delivers it when the subscribersubsequently joins the network.Delivering updates through instant messaging systemsincurs additional latency since messages are sent througha centralized service. However, the additional latency ismodest as IM systems are designed to reduce such la-tencies during two-way communication. Moreover, IMsystems that allow peer-to-peer communication betweentheir users, such as Skype, deliver messages in quicktime.Instant messaging enables Corona to be easily accessi-ble to a large user population, as no computer skills otherthan an ability to “chat” are required, and ubiquitous IMdeployment ensures that hosts behind NATs and firewallsare supported. Moreover, instant messages also guaran-tee the authenticity of the source of update messages tothe clients, as instant messaging systems pre-authenticateCorona as the source through password verification.

4 ImplementationWe have implemented a prototype of Corona as an ap-plication layered on Pastry [29], a prefix-matching struc-tured overlay system. The implementation uses a 160-bitSHA-1 hash function to generate identifiers for both thenodes (based on their IP addresses) and channels (basedon their URLs). Both the base of Pastry and the numberof tradeoff clusters per polling level are set to 16.

NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation USENIX Association22

Page 9: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

Prefix matching overlays occasionally create orphans,that is, channels with no owners having enough match-ing prefix digits. Orphans are created when the re-quired wedge of the identifier space, corresponding tolevel dlog Ne − 1, is empty. Corona cannot assign ad-ditional nodes to poll an orphan channel to improve itsupdate performance. Moreover, orphans adversely affectthe computation of globally optimal allocation. Coronahandles orphan channels by adjusting the tradeoffs ap-propriately. The tradeoff factors of orphan channels areaggregated into a slack cluster, which is used to adjustthe performance target prior to optimization.Corona’s reliance on IM as an asynchronous commu-nication mechanism poses some operational challenges.Corona interacts with IM systems using GAIM [12],an open source instant messaging client for Unix-basedplatforms that supports multiple IM systems includingYahoo Instant Messenger, AOL Instant Messenger, andMSN Messenger. Several IM systems have a limitationthat only one instance of a user can be logged on at atime, preventing all Corona nodes from being logged onsimultaneously. While we hope that IM systems will sup-port simultaneous logins from automated users such asCorona in the near future, as they have for several chatrobots, our implementation uses a centralized server totalk to IM systems as a stop-gap measure. This serveracts as an intermediary for all updates sent to clients aswell as subscription messages sent by clients. Also, IMsystems such as Yahoo rate-limit instant messages sentby unprivileged clients. Corona’s implementation limitsthe rate of updates sent to clients to avoid sending up-dates in bursts.Corona trusts the nodes in the system to behave cor-rectly and generate authentic updates. However, it ispossible that in a collaborative deployment, where nodesunder different administrative domains are part of theCorona network, some nodes may be malicious and gen-erate spurious updates. This problem can be easilysolved if content providers are willing to publish digi-tally signed certificates along with their content. An al-ternative solution that does not require changes to serversis to use threshold-cryptography to generate a certificatefor content [40, 15]. The responsibility for generatingpartial signatures can be shared among the owners of anode ensuring that rogue nodes below the threshold levelcannot corrupt the system. Designing and implementingsuch a threshold-cryptographic scheme is, however, be-yond the scope of this paper.

5 EvaluationWe evaluate the performance of Corona through large-scale simulations and wide-area experiments on Planet-Lab [2]. In all our evaluations, we compare the perfor-

mance of Corona to the performance of legacy RSS, awidely-used micronews syndication system. The simula-tions and experiments are driven by real-life RSS traces.We collected characteristics of micronews workloadsand content by passively logging user activity and ac-tively polling RSS feeds [19]. User activity recordedbetween March 22 and May 3 of 2005 at the gate-way of the Cornell University Computer Science De-partment provided a workload of 158 clients makingapproximately 62,000 requests for 667 different feeds.The channel popularity closely follows a Zipf distribu-tion with exponent 0.5. The survey analyzes the updaterate of micronews content by actively polling approxi-mately 100,000 RSS feeds obtained from syndic8.com.We poll these feeds at one hour intervals for 84 hours,and subsequently select a subset of 1000 feeds and pollthem at a finer granularity of 10 minutes for five days.Comparing periodic snapshots of the feeds shows thatthe update interval of micronews content is widely dis-tributed: about 10% of channels changed within an hour,while 50% of channels did not change at all during thefive days of polling.

5.1 SimulationsWe use tradeoff parameters based on the RSS survey inour simulations. In order to scale the workload to thelarger scale of our simulations, we extrapolate the distri-bution of feed popularity from the workload traces andset the popularity to follow a Zipf distribution with ex-ponent 0.5. We use a distribution for the update rates ofchannels obtained through active polling, setting the up-date interval of the channels that do not see any updatesto one week.We perform simulations for a system of 1024 nodes,100,000 channels, and 5,000,000 subscriptions. We starteach simulation with an empty state and issue all sub-scriptions at once before collecting performance data.We run the simulations for six hours with a polling inter-val of 30 minutes and maintenance interval of one hour.We study the performance of the three schemes, Corona-Lite, Corona-Fast, and Corona-Fair, and compare theperformance with that of legacy RSS clients polling atthe same rate of 30 minutes.

Corona-LiteFigures 3 and 4 show the network load and update perfor-mance, respectively, for Corona-Lite, which minimizesaverage update detection time while bounding the to-tal load on content servers. The figures plot the net-work load, in terms of the average bandwidth load placedon content servers, and update performance, in terms ofthe average update detection time. Figure 3 shows thatCorona-Lite stabilizes at the load imposed by legacy RSSclients. Starting from a clean slate where only owner

NSDI ’06: 3rd Symposium on Networked Systems Design & ImplementationUSENIX Association 23

Page 10: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

0 2 4 60

0.5

1

1.5

Time (hours)

Netw

orkL

oadp

erCh

anne

l(kb

ps)

Legacy RSSCorona LiteCorona Fast

Figure 3: Network Load on Content Servers: Corona-Liteconverges quickly to match the network load imposed bylegacy RSS clients.

0 2 4 6

0.5

1

1015

100

Time (hours)

Upda

teDe

tectio

nTim

e(mi

n)

Legacy RSSCorona LiteCorona Fast

Figure 4: Average Update Detection Time: Corona-Liteprovides 15-fold improvement in update detection timecompared to legacy RSS clients for the same network load.

nodes poll for each channel, Corona-Lite quickly con-verges to its target in two maintenance phases. The av-erage load exceeds the target for a brief period beforestabilization. This slight delay is due to nodes not hav-ing complete information about tradeoff factors of otherchannels in the system. However, the discrepancy iscorrected automatically when aggregated global tradeofffactors are available to each node.At the same time, Figure 4 shows that Corona-Liteachieves an average update detection time of about oneminute. The update performance of Corona-Lite repre-sents an order of magnitude improvement over the av-erage update detection time of 15 minutes provided bylegacy RSS clients. This substantial difference in per-formance is achieved through judicious distribution ofpolling load between cooperating nodes, while imposingno more load on the servers than the legacy clients.Figures 5 and 6 show the number of polling nodes as-signed by Corona-Lite to different channels and the re-sulting distribution of update detection times. The x-

0 20000 40000 60000 80000 100000100

101

102

103

104

Channel Rank by Popularity

Numb

erof

Polli

ngNo

des

Legacy RSSCorona Lite

Figure 5: Number of Pollers per Channel: Corona tradesoff network load from popular channels to decrease updatedetection time of less popular channels and achieve a lowersystem-wide average.

0 20000 40000 60000 80000 100000100

101

102

103

104

Channel Rank by Popularity

Upda

teDe

tectio

nTim

e(se

c)

Legacy RSSCorona Lite

Figure 6: Update Detection Time per Channel: Popularchannels gain greater decrease in update detection timethan less popular channels.

axis shows channels in reverse order of popularity. Weonly plot 20,000 channels for clarity. The load imposedby legacy RSS is equal to the number of clients. ForCorona-Lite, three levels of polling can be identified inFigure 5: channels clustered around 100 at level 1, chan-nels with fewer than 10 clients at level 2, and orphanchannels close to the X-axis with just one owner nodepolling them. The sharp change in the distribution after75,000 channels indicates the point where the optimalsolution changes polling levels.Figure 5 shows that Corona-Lite favors popular chan-nels over unpopular ones when assigning polling levels.Yet, it significantly reduces the load on servers of pop-ular content compared to legacy clients, which imposea highly skewed load on content servers and overloadservers of popular content. Corona-Lite reduces the loadof the over-loaded servers and transfers the extra load toservers of less popular content to improve update perfor-mance.

NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation USENIX Association24

Page 11: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

Average Update Average LoadScheme Detection Time (polls per 30 min

(sec) per channel)Legacy-RSS 900 50.00Corona-Lite 53 48.97Corona-Fair 142 50.14Corona-Fair-Sqrt 55 49.46Corona-Fair-Log 53 49.43Corona-Fast 32 58.75

Table 2: Performance Summary: This table provides asummary of average update detection time and networkload for different versions of Corona. Overall, Coronaprovides significant improvement in update detection timecompared to Legacy RSS, while placing the same load onservers.

The favorable behavior of Corona-Lite is due to dimin-ishing returns caused by the inverse relation between theupdate detection time and the number of polling nodes.It is more beneficial to distribute the polling across manychannels than to devote a large percentage of the band-width to polling the most popular channels. Neverthe-less, load distribution in Corona-Lite respects the pop-ularity distribution of channels: popular channels arepolled by more nodes than less popular channels (seeFigure 5). The upshot is that popular channels gain anorder of magnitude improvement in update performanceover less popular ones (see Figure 6).

Corona-FastUnlike Corona-Lite, Corona-Fast minimizes the totalload on servers while aiming to achieve a target updatedetection latency. Figures 3 and 4 show the network loadand update performance, respectively, for Corona-Fast.Figure 4 confirms that Corona-Fast closely meets thedesired target of 30 seconds. This improvement in up-date detection time entails an increase in server load overCorona-Lite. Unlike Corona-Lite, whose update perfor-mance may vary depending on the workload seen by thesystem, Corona-Fast provides a stable average updateperformance. Moreover, it enables us to set the perfor-mance depending on the requirements of the applicationand ensures that the targeted performance is achievedwith minimal load on content servers.

Corona-FairFinally, we examine the performance of Corona-Fair,which uses the update rates of channels to fine-tune thedistribution of load. It takes advantage of the wide dis-tribution of update intervals among channels and aimsto poll frequently updated channels at a higher ratethan channels with long update intervals. Figure 7shows the distribution of update detection times achievedby Corona-Lite and Corona-Fair for different channels

0 20000 40000 60000 80000 100000100

101

102

103

104

Channel Rank by Update Interval

Upda

teDe

tectio

nTim

e(se

c)

Corona LiteCorona Fair

Figure 7: Update Detection Time per Channel: Corona-Fair provides better update detection time for channels thatchange rapidly than for channels that change rarely.

0 20000 40000 60000 80000 100000100

101

102

103

104

Channel Rank by Upate Interval

Upda

teDe

tectio

nTim

e(se

c)

Corona Fair SqrtCorona Fair Log

Figure 8: Update Detection Time per Channel: Corona-Fair-Sqrt and Corona-Fair-Log fix the bias against chan-nels that change rarely and provide better update detectiontime for them than Corona-Fair does.

ranked by their update intervals. Channels with the sameupdate intervals are further ranked by popularity. Forclarity of presentation, we plot the distribution for 200uniformly chosen channels.Figure 7 shows that Corona-Lite achieves an unfairdistribution of update detection times by ignoring updateinterval information. Many channels with large updateintervals have short update detection times (shown in thelower-right of the graph), while some rapidly changingchannels have long update detection times (shown in theupper-left of the graph). Corona-Fair fixes this unfair dis-tribution of update detection time by using update inter-vals of channels to influence the choice of polling levels.Figure 7 shows that Corona-Fair has a fairer distributionof update detection times with update intervals; that is,channels with shorter update intervals have faster updatedetection times and vice-versa.Corona-Fair optimizes for update performance mea-sured as the ratio of update detection time and update in-

NSDI ’06: 3rd Symposium on Networked Systems Design & ImplementationUSENIX Association 25

Page 12: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

terval. Thus, channels with long update intervals mayalso have proportionally long update detection times,leading to long wait times for clients. Section 3.1 pro-posed to compensate for this bias using two metrics withsub-linear growth based on the square root and logarithmof the update interval. Figure 8 shows that Corona-Fair-Sqrt and Corona-Fair-Log achieve update detection timesthat are fairer and lower than Corona-Fair. Between thetwo metrics, Corona-Fair-Sqrt is better than Corona-Fair-Log, which has a few channels with small update inter-vals but long update detection times.Overall, the Corona-Fair schemes provide fair dis-tributions of polling between channels with low aver-age update detection times without exceeding bandwidthload on the servers. The average update detection timeand load for different Corona-Fair schemes is shown inTable 2. The average update detection time suffers alittle in Corona-Fair compared to Corona-Lite, but themodified Corona-Fair schemes provide an average per-formance close to that of Corona-Lite.

5.2 DeploymentWe deployed Corona on a set of 60 PlanetLab nodes andmeasured its performance. The deployment is based onthe Corona-Lite scheme, which minimizes update detec-tion time while bounding network load. For this exper-iment, we use 7500 real channels providing RSS feedsobtained from www.syndic8.com. We issue 150,000 sub-scriptions for them based on a Zipf popularity distribu-tion with exponent 0.5. Subscriptions are issued at a uni-form rate during the first hour and a half of the experi-ment. The maintenance interval and the polling intervalare both set to 30 minutes. We collected data for a periodof six hours.Figure 9 shows the average update detection time forCorona deployment compared to legacy RSS. Corona de-creases the average update time to about 45 seconds com-pared to 15 minutes for legacy RSS. Figure 10 showsthe corresponding polling load imposed by Corona oncontent servers. Corona gradually increases the num-ber of nodes polling each channel and reaches a loadlimit of around 4500 polls per minute. Corona’s totalnetwork load is bounded by the load imposed by legacyRSS, which averages to just above 5000 polls per minute.These graphs highlight that while imposing comparableload as legacy RSS, Corona achieves a substantial im-provement in update detection time.

5.3 SummaryThe results from simulations and wide-area experimentsconfirm that Corona achieves a balance between updatelatency and network load. It dynamically learns the pa-rameters of the system such as number of nodes, numberof subscriptions, and tradeoff factors of all channels, and

0 2 4 6101

102

103

Time (hours)

Upda

teDe

tectio

nTi

me(

sec)

CoronaLegacy RSS

Figure 9: Average Update Detection Time: Corona pro-vides an order of magnitude lower update detection timecompared to legacy RSS.

0 2 4 60

2000

4000

6000

Time (hours)

Total

Netw

ork

Polls

perM

in

CoronaLegacy RSS

Figure 10: Total Polling Load on Servers: The total loadgenerated by Corona is well below the load generated byclients using legacy RSS

uses the new parameters to periodically adjust the opti-mal polling levels of channels and meet performance andload targets. Corona offers considerable flexibility in thekind of performance goals it can achieve. In this section,we showed three specific schemes targeting update de-tection time, network load, and fair distribution of loadunder different metrics of fairness. Measurements fromthe deployment showed that achieving globally optimalperformance in a distributed wide-area system is practi-cal and efficient. Overall, Corona proves to be a highperformance, scalable publish-subscribe system.

6 ConclusionsThis paper proposes a novel publish-subscribe architec-ture that is compatible with the existing pull-based archi-tecture of theWeb. Motivated by the growing demand formicronews feeds and the paucity of infrastructure to pro-vide asynchronous notifications, we develop a unique so-lution that addresses the shortcomings of pull based con-

NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation USENIX Association26

Page 13: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

tent dissemination and delivers a real, deployable, easy-to-use publish-subscribe system.Many real-world applications require quick and effi-cient dissemination of information from data sources toclients. It is quite common for legacy data sources, suchas Web pages, sensors, stock feeds, event trackers and soforth, to be deployed piecemeal, and thus to force clientsto poll them manually and explicitly to receive the latestupdates. As the numbers of data sources increase, thetask of monitoring so many event sources quickly be-comes overwhelming for humans. At sufficiently largescales, the task of allocating bandwidth is difficult evenfor computers. We can see examples of such applicationsin large scale sensor networks, in investment manage-ment systems that track commodity prices, and in manyadaptive distributed systems for detecting events. All ofthese applications pose a fundamental tension betweenthe polling bandwidth required to achieve fast event de-tection and the corresponding load imposed by periodicpolling.Our unique contribution is the optimal resolution ofperformance-overhead tradeoffs in such event detectionsystems. This paper provides a general approach basedon analytical modeling of the cost-performance tradeoffand mathematical optimization that enables applicationsto make informed, near-optimal decisions on which datasources to monitor, and with what frequency. We developtechniques to solve typical resource allocation problemsthat arise in distributed systems through decentralized,low-overhead mechanisms.The techniques at the core of this system are easilyapplicable to any domain where a set of nodes monitorexogenous events. The Corona approach to monitoringoblivious, pull-based data sources makes it unnecessaryto change the data publishing workflow, agree on newdissemination protocols, or deploy new software on datasources. This is particularly relevant when the sourcesto be monitored are large in number, and deploying newsoftware is logistically difficult. For instance, large scaleWeb spiders that monitor changes to Websites to incre-mentally update a Web index could benefit from the prin-cipled approach developed here.Corona applies this general approach to disseminat-ing updates to the Web, where the resource-performancetradeoff is affected by the popularity, size, and up-date rate of Web content and the network capacities ofclients and content servers. Performance measurementsbased on simulations and real-life deployment show thatCorona clients can achieve several orders of magnitudeimprovement in update latency without an increase inaverage load. Corona acts as a buffer between clientsand servers, shielding servers from the impact of flash-crowds and sticky traffic. Our implementation is cur-rently deployed on PlanetLab and available for pub-

lic use. We hope that a backwards-compatible, high-performance, efficient publish-subscribe system willmake it possible for people to easily track frequentlychanging content on the Web.

AcknowledgementsWe would like to thank Rohan Murty for implementingan earlier prototype of the Corona system, Yee Jiun Songfor his help with the system, and our shepherd Dave An-dersen for his guidance and feedback. This work wassupported in part by NSF CAREER grant 0546568, andTRUST (The Team for Research in Ubiquitous SecureTechnology), which receives support from the NationalScience Foundation (CCF-0424422) and the followingorganizations: Cisco, ESCHER, HP, IBM, Intel, Mi-crosoft, ORNL, Qualcomm, Pirelli, Sun, and Symantec.

References[1] Atom. Atom Syndication Format. http://www.-atomenabled.org/developers/syndication, Oct. 2005.

[2] A. Bavier, M. Bowman, B. Chun, D. Culler, S. Karlin,S. Muir, L. Peterson, T. Roscoe, T. Spalink, andM.Wawr-zoniak. Operating System Support for Planetary-ScaleNetwork Services. In Proc. of Symposium on NetworkedSystems Design and Implementation, Boston, MA, Mar.2004.

[3] C. Botev and J. Shanmugasundaram. Context SensitiveKeyword Search and Ranking for XML. In Proc. of In-ternational Workshop on Web and Databases, Baltimore,MD, June 2005.

[4] L. F. Cabera, M. B. Jones, and M. Theimer. Herald:Achieving a Global Event Notification Service. In Proc.of Workshop on Hot Topics in Operating Systems, Elmau,Germany, May 2001.

[5] N. Carriero and D. Gelernter. Linda in Context. Commu-nications of the ACM, 32(4):444–458, Apr. 1989.

[6] A. Carzaniga, D. S. Rosenblum, and A. L. Wolf. De-sign and Evaluation of a Wide-Area Event Notifica-tion Service. ACM Transactions on Computer Systems,19(3):332–383, Aug. 2001.

[7] C.-Y. Chan, P. Felber, M. Garofalakis, and R. Rastogi.Efficient Filtering of XML Documents with XPath Ex-pressions. In Proc. of International Conference on DataEngineering, San Jose, CA, Feb. 2002.

[8] Y. Diao, S. Rizvi, and M. Franklin. Towards an Internet-Scale XML Dissemination Service. In Proc. of Inter-national Conference on Very Large Databases (VLDB),Toronto, Canada, Aug. 2004.

[9] J. R. Douceur, A. Adya, W. J.Bolosky, and D. Simon. Re-claiming Space from Duplicate Files in a Serverless Dis-tributed File System. In Proc. of International Conferenceon Distributed Computing Systems, Vienna, Austria, July2002.

[10] F. Douglis, A. Feldman, B. Krishnamurthy, and J. Mogul.Rate of Change and Other Metrics: a Live Study of theWorld Wide Web. In Proc. of USENIX Symposium on

NSDI ’06: 3rd Symposium on Networked Systems Design & ImplementationUSENIX Association 27

Page 14: Corona: A High Performance Publish-Subscribe System for ... A High Performance Publish-Subscribe System for the World Wide Web Venugopalan Ramasubramanian Ryan Peterson Emin G¨un

Internet Technologies and Systems, Monterey, CA, Dec.1997.

[11] W. Fenner, M. Rabinovich, K. K. Ramakrishnan, D. Sri-vastava, and Y. Zhang. XTreeNet: Scalable OverlayNetworks for XML Content Dissemination and Query-ing (Synopsis). In Proc. of International Workshop onWeb Content Caching and Distribution, Sophia Antipo-lis, France, Sept. 2005.

[12] GAIM. A Multi-Protocol Instant Messaging Client.http://gaim.sourceforge.net, Oct. 2005.

[13] B. Glade, K. P. Birman, R. Cooper, and R. van Renesse.Light-Weight Process Groups in the ISIS System. Dis-tributed Systems Engineering, 1(1):29–36, Sept. 1993.

[14] N. Harvey, M. Jons, S. Saroiu, M. Theimer, and A. Wol-man. SkipNet: A Scalable Overlay Network with Practi-cal Locality Properties. In Proc. of USENIX Symposiumon Internet Technologies and Systems, Seattle, WA, Mar.2003.

[15] W. K. Josephson, E. G. Sirer, and F. B. Schneider. Peer-to-Peer AuthenticationWith a Distributed Single Sign-OnService. In Proc. of International Workshop on Peer-to-Peer Systems, San Diego, CA, Feb. 2004.

[16] JS-JavaSpaces Service Specification. http://www.jini.-org/nonav/standards/davis/doc/specs/html/js-spec.html,2002.

[17] F. Kaashoek and D. Karger. Koorde: A Simple Degree-Optimal Distributed Hash Table. In Proc. of InternationalWorkshop on Peer-to-Peer Systems, Berkeley, CA, Feb.2003.

[18] D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin,and R. Panigraphy. Consistent Hashing and RandomTrees: Distributed Caching Protocols for Relieving HotSpots on the World Wide Web. In Proc. of ACM Sympo-sium on Theory of Computing, El Paso, TX, Apr. 1997.

[19] H. Liu, V. Ramasubramanian, and E. G. Sirer. ClientBehavior and Feed Characteristics of RSS, a Publish-Subscribe System for Web Micronews. In Proc. of ACMInternet Measurement Conference, Berkeley, CA, Oct.2005.

[20] L. Liu, C. Pu, and W. Tang. WebCQ: Detecting and De-livering Information Changes on the Web. In Proc. of theInternational Conference on Information and KnowledgeManagement, McLean, VA, Nov. 2000.

[21] L. Liu, C. Pu, W. Tang, and W. Han. CONQUER: AContinual Query System for Update Monitoring in theWWW. International Journal of Computer Systems, Sci-ence and Engineering, 14(2):99–112, 1999.

[22] D. Malkhi, M. Naor, and D. Ratajczak. Viceroy: A Scal-able and Dynamic Emulation of the Butterfly. In Proc. ofACM Symposium on Principles of Distributed Comput-ing, Monterey, CA, Aug. 2002.

[23] P. Maymounkov and D. Mazieres. Kademlia: A Peer-to-peer Information System Based on the XOR Metric. InProc. of International Workshop on Peer-to-Peer Systems,Cambridge, CA, Mar. 2002.

[24] A. Mizrak, Y. Cheng, V. Kumar, and S. Savage. Struc-tured Superpeer: Leveraging Heterogeneity to ProvideConstant-time Lookup. In Proc. of IEEE Workshop onInternet Applications, San Francisco, CA, Apr. 2003.

[25] S. Pandey, K. Dhamdere, and C. Olston. WIC: AGeneral-Purpose Algorithm for Monitoring Web Infor-mation Sources. In Proc. of the Conference on Very LargeData Bases (VLDB), Toronto, Canada, Aug. 2004.

[26] S. Pandey, K. Ramamritham, and S. Chakraborti. Mon-itoring the Dynamic Web to Respond to ContinuousQueries. In Proc. of the International World Wide WebConference, Budapest, Hungary, May 2003.

[27] V. Ramasubramanian and E. G. Sirer. Beehive: Exploit-ing Power Law Query Distributions for 0(1) Lookup Per-formance in Peer-to-Peer Overlays. In Proc. of Sympo-sium on Networked Systems Design and Implementation,San Francisco, CA, Mar. 2004.

[28] S. Ratnasamy, P. Francis, M. Handley, R. Karp, andS. Shenker. A Scalable Content-Addressable Network.In Proc. of ACM SIGCOMM, San Diego, CA, Aug. 2001.

[29] A. Rowstron and P. Druschel. Pastry: Scalable, De-centralized Object Location and Routing for Large-scalePeer-to-Peer Systems. In Proc. of IFIP/ACM Interna-tional Conference on Distributed Systems Platforms, Hei-delberg, Germany, Nov. 2001.

[30] RSS 2.0 Specifications. http://blogs.law.harvard.edu/-tech/rss, Oct. 2005.

[31] D. Sandler, A. Mislove, A. Post, and P. Druschel.FeedTree: Sharing Web Micronews with Peer-to-PeerEvent Notification. In Proc. of International Workshopon Peer-to-Peer Systems, Ithaca, NY, Feb. 2005.

[32] B. Segall, D. Arnold, J. Boot, M. Henderson, andT. Phelps. Content Based Routing with Elvin. In Proc.of AUUG2K, Canberra, Australia, June 2000.

[33] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Bal-akrishnan. Chord: A Scalable Peer-to-peer Lookup Ser-vice for Internet Applications. In Proc. of ACM SIG-COMM, San Diego, CA, Aug. 2001.

[34] R. Strom, G. Banavar, T. Chandra, M. Kaplan, K. Miller,B. Mukherjee, D. Sturman, and M. Ward. Gryphon: AnInformation Flow Based Approach to Message Broker-ing. In Proc. of International Symposium on SoftwareReliability Engineering, Paderborn, Germany, Nov. 1998.

[35] TIBCO Publish-Subscribe. http://www.tibco.com, Oct.2005.

[36] TSpaces. http://www.almaden.ibm.com/cs/TSpaces/,Oct. 2005.

[37] R. van Renesse, K. P. Birman, and W. Vogels. Astrolabe:A Robust and Scalable Technology For Distributed Sys-tems Monitoring, Management, and Data Mining. ACMTransactions on Computer Systems, 21(3), May 2003.

[38] U. Wieder and M. Naor. A Simple Fault Tolerant Dis-tributed Hash Table. In Proc. of International Workshopon Peer-to-Peer Systems, Berkeley, CA, Feb. 2003.

[39] B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, andJ. Kubiatowicz. Tapestry: A Resilient Global-scale Over-lay for Service Deployment. IEEE Journal on SelectedAreas in Communications, 22(1):41–53, Jan. 2004.

[40] L. Zhou, F. B. Schneider, and R. van Renesse. COCA: ASecure Distributed On-line Certification Authority. ACMTransactions on Computer Systems, 20(4):329–368, Nov.2002.

NSDI ’06: 3rd Symposium on Networked Systems Design & Implementation USENIX Association28