characterizing ipv4 anycast adoption and deployment€¦ · characterizing ipv4 anycast adoption...

13
Characterizing IPv4 Anycast Adoption and Deployment Danilo Cicalese Telecom ParisTech [email protected] Jordan Augé Telecom ParisTech [email protected] Diana Joumblatt Telecom ParisTech [email protected] Timur Friedman UPMC Sorbonne Universités [email protected] Dario Rossi Telecom ParisTech [email protected] ABSTRACT This paper provides a comprehensive picture of IP-layer anycast adoption in the current Internet. We carry on multiple IPv4 anycast censuses, relying on latency mea- surement from PlanetLab. Next, we leverage our novel technique for anycast detection, enumeration, and ge- olocation [17] to quantify anycast adoption in the In- ternet. Our technique is scalable and, unlike previous efforts that are bound to exploiting DNS, is protocol- agnostic. Our results show that major Internet com- panies (including tier-1 ISPs, over-the-top operators, Cloud providers and equipment vendors) use anycast: we find that a broad range of TCP services are offered over anycast, the most popular of which include HTTP and HTTPS by anycast CDNs that serve websites from the top-100k Alexa list. Additionally, we complement our characterization of IPv4 anycast with a description of the challenges we faced to collect and analyze large- scale delay measurements, and the lessons learned. CCS Concepts Networks Network measurement; Network structure; General and reference Measurement; Keywords Anycast; Census; Network monitoring; Network mea- surement; IPv4; BGP Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is per- mitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM. ISBN ISBN 978-1-4503-3412-9/15/12. . . $15.00 DOI: http://dx.doi.org/10.1145/2716281.2836101 1. INTRODUCTION Modern content-delivery networks (CDNs) employ L7- anycast, exploiting DNS and HTTP redirection tech- niques to direct traffic from a client to any server in a group of geographically dispersed but otherwise equiv- alent servers. Such redirection techniques perform load balancing among nearby replicas and map users to the closest replica, reducing user-perceived latency. Network-level (IP) anycast [4] is another instantia- tion of the same principle, where a set of replicas spread across a number of locations around the world share a standard unicast IP address. BGP policies route pack- ets sent to this address to the nearest replica according to BGP metrics, notably (though not only) the number of autonomous system (AS) hops. L7 anycast and IP anycast are complementary. On one hand, L7 anycast allows for very dense server de- ployments with customized user-server mapping algo- rithms and complex operations to shuffle content among servers. Although this allows a fine grain control of the server selection, it also increases the management com- plexity [36]. On the other hand, IP anycast offers a loose control over user-server mapping, which limits the deployment density but considerably simplifies manage- ment by delegating replica selection to IP routing. In recent years, the scientific community has made significant contributions to understand L7 anycast, e.g., to uncover deployments and geolocate points of pres- ence (PoPs) with active measurements [15,45–47], and characterize the performance of L7 anycast via passive measurements [5,12]. Yet, with few exceptions [45] as these application-level deployments are diverse, and as PoPs are pervasive, such efforts generally focus on a single application/player such as Akamai [46], YouTube [5, 47], Amazon [12] and Google [15]. A recent trend in the area of Internet infrastructure mapping is to exploit the edns-client-subnet DNS extension (ECS) [20] to un-

Upload: others

Post on 22-May-2020

28 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Characterizing IPv4 Anycast Adoption and Deployment€¦ · Characterizing IPv4 Anycast Adoption and Deployment ... CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM

Characterizing IPv4 AnycastAdoption and Deployment

Danilo CicaleseTelecom ParisTech

danilocicaleseenstfr

Jordan AugeacuteTelecom ParisTech

jordanaugeenstfr

Diana JoumblattTelecom ParisTechjoumblatenstfr

Timur FriedmanUPMC Sorbonne

Universiteacutestimurfriedmanlip6fr

Dario RossiTelecom ParisTech

dariorossienstfr

ABSTRACTThis paper provides a comprehensive picture of IP-layeranycast adoption in the current Internet We carry onmultiple IPv4 anycast censuses relying on latency mea-surement from PlanetLab Next we leverage our noveltechnique for anycast detection enumeration and ge-olocation [17] to quantify anycast adoption in the In-ternet Our technique is scalable and unlike previousefforts that are bound to exploiting DNS is protocol-agnostic Our results show that major Internet com-panies (including tier-1 ISPs over-the-top operatorsCloud providers and equipment vendors) use anycastwe find that a broad range of TCP services are offeredover anycast the most popular of which include HTTPand HTTPS by anycast CDNs that serve websites fromthe top-100k Alexa list Additionally we complementour characterization of IPv4 anycast with a descriptionof the challenges we faced to collect and analyze large-scale delay measurements and the lessons learned

CCS ConceptsbullNetworks rarr Network measurement Networkstructure bullGeneral and referencerarrMeasurement

KeywordsAnycast Census Network monitoring Network mea-surement IPv4 BGP

Permission to make digital or hard copies of all or part of this work for personalor classroom use is granted without fee provided that copies are not made ordistributed for profit or commercial advantage and that copies bear this noticeand the full citation on the first page Copyrights for components of this workowned by others than ACM must be honored Abstracting with credit is per-mitted To copy otherwise or republish to post on servers or to redistribute tolists requires prior specific permission andor a fee Request permissions frompermissionsacmorg

CoNEXT rsquo15 December 01-04 2015 Heidelberg Germanyccopy 2015 ACM ISBN ISBN 978-1-4503-3412-91512 $1500

DOI httpdxdoiorg10114527162812836101

1 INTRODUCTIONModern content-delivery networks (CDNs) employ L7-

anycast exploiting DNS and HTTP redirection tech-niques to direct traffic from a client to any server in agroup of geographically dispersed but otherwise equiv-alent servers Such redirection techniques perform loadbalancing among nearby replicas and map users to theclosest replica reducing user-perceived latency

Network-level (IP) anycast [4] is another instantia-tion of the same principle where a set of replicas spreadacross a number of locations around the world share astandard unicast IP address BGP policies route pack-ets sent to this address to the nearest replica accordingto BGP metrics notably (though not only) the numberof autonomous system (AS) hops

L7 anycast and IP anycast are complementary Onone hand L7 anycast allows for very dense server de-ployments with customized user-server mapping algo-rithms and complex operations to shuffle content amongservers Although this allows a fine grain control of theserver selection it also increases the management com-plexity [36] On the other hand IP anycast offers aloose control over user-server mapping which limits thedeployment density but considerably simplifies manage-ment by delegating replica selection to IP routing

In recent years the scientific community has madesignificant contributions to understand L7 anycast egto uncover deployments and geolocate points of pres-ence (PoPs) with active measurements [15 45ndash47] andcharacterize the performance of L7 anycast via passivemeasurements [5 12] Yet with few exceptions [45] asthese application-level deployments are diverse and asPoPs are pervasive such efforts generally focus on asingle applicationplayer such as Akamai [46] YouTube[547] Amazon [12] and Google [15] A recent trend inthe area of Internet infrastructure mapping is to exploitthe edns-client-subnet DNS extension (ECS) [20] to un-

cover the geographical footprints of major CDNs [1545] Still given the wide design space and flexibilityin L7 anycast implementations it is hard to generalizeresults of L7 anycast usage performance and geograph-ical deployments across CDNs

Conversely most IP anycast studies [93443] are lim-ited to DNS which has historically been the killer appli-cation of IP anycast [372937] This paper shows thatthe usage of IP anycast has significantly changed in re-cent years In particular major players of the Internetecosystem including Internet service providers (ISPs)OTTs and manufacturers provide a diversity of ser-vices with IP anycast (eg content distribution cloudservices web hosting web acceleration DDoS protec-tion) Yet missing an Internet-scale study of IP anycastdeployment the scientific community is not up-to-datewith such changes and knowledge related to non-DNSanycast (eg anycast IP address ranges the numberof replicas behind each address services provided) isanecdotal at best which motivates our current work

Indeed while valuable research efforts (Sec 2) startedwith seminal work such as [30] and culminated with [122] more recently focus on unicast censuses this paperpresents the first census of the use of IPv4 anycast inthe Internet First we describe the challenges faced indesigning a system able to collect and analyze Internet-scale delay measurements [17] in a short time frame(Sec 3) Next we discuss the results of a thorough ex-perimental campaign analyzing anycast adoption overmultiple anycast IPv4 censuses (Sec 4) To summarizeour main contributions

bull We conduct and combine delay measurements fromfour full censuses based on which we find aboutO(103) IP24 subnets to be anycasted

bull We characterize the geographical footprint of IPanycast deployments that we (conservatively) findon average to have O(10) replicas

bull We provide empirical evidence that IP anycast isused by ASes in the CAIDA top-10 rank and byASes serving content over HTTP and HTTPS forwebsites in the Alexa top-100 rank

bull We show that anycast is used to serve a large di-versity of stateful services (a complementary port-scan finds 10000 open port about 500 of whichare well known) running on top of TCP

bull We describe our distributed system design able toperform and analyze one census in under 5 hours

bull We make our census results browsable at [21]

2 METHODOLOGY OVERVIEWIn this section we present an overview of our work

(Sec 21) and put it in perspective with prior research

Greylist

Unicast

DeteconEnumeraonGeolocaon

Validaon

Blacklist

IPv4 hitlist PlanetLab

Analysis [16]

Anycast(ICMPTCPUDPRTT Latency)

Measurement

Characterizaon

Ground truth

nmap portscan

Figure 1 Overall workflow of the anycast census

Figure 2 Analysis technique anycast detection(figure adapted from [17])

efforts (Sec 22) For the sake of readability we describeour complete workflow with the help of Fig 1

21 WorkflowMeasurements We use a distributed software run-ning over PlanetLab (PL) to conduct IPv4 anycast cen-suses with ICMP latency measurements Each of theO(102) PL vantage points (VP) receives a set of O(107)IP32 targets (namely the IPv4 hitlist provided by [31])We consider that each IP32 in this hitlist is repre-sentative of the corresponding IP24 subnet and thuscover the entire IPv4 address space (we validate thisassumption in Sec 31) Later on we thoroughly jus-tify our choices of the measurement platform (eg PLover RIPE MLab Archipelago etc in Sec 32) soft-ware (eg fastpingTDMI over Zmap in Sec 33) andprotocols (eg ICMP over TCP or UDP in Sec 34)

Analysis The dataset collected from the census isuploaded to a central repository We run an iterativealgorithm that we recently proposed [17] to detect enu-merate and geolocate anycast replicas over the datasetFor the sake of completeness we provide an overview ofthe technique which is based on detection of speed-of-light violations [35] as depicted in Fig 2 the main ideais that in case latency measurements from two vantagepoints toward the same target exhibit geo-inconsistencythen it is safe to assume the target to be anycast

We illustrate an execution of the technique in Fig 3Briefly given a specific target IP (a) we first map eachRTT latency measurement to a disk centered around theVP that by definition contains the target contacted(b)if two such disks do not intersect as just discussed wecan infer that VPs are contacting two different replicas

(a) Measure-ment Map RTTsamples to diskscentered aroundVPs

(b) Detect Non-overlapping disksimply speed-of-light violation

(c) Enumerate Solving a Maxi-mum Independent Set (MIS) prob-lem yields non-overlapping disk eachcontaining a different replica (twosteps shown)

(d) GeolocateMaximum likeli-hood classificationproblem (city-level)

(e) Iterate Col-lapse disks aroundgeolocated repli-cas until conver-gence

Figure 3 Illustration of the Analysis technique enumeration and geolocation steps)

as is the case for the green discs in Fig 3(b) While ob-servation of inconsistency among any pair lead to any-cast detection leveraging multiple observations it is pos-sible to further enumerate such replicas Enumerationis described in step (c) to provide a conservative esti-mation of the minimum number of anycast replicas wesolve a Maximum Independent Set (MIS) problem MISoutputs a set of non-overlapping disks which contain adifferent replica of the same target while MIS prob-lem is NP-Hard we solve it using a 5-approximationalgorithm that greedily operates on disks of increasingradius size as in Fig 3(c) and that in practice yieldresults that are very close to the optimum providedby a prohibitively more costly brute force solution [17]Geolocation happens in step (d) in the smallest diskwe geolocate the replica at city-level granularity with amaximum likelihood estimator biased toward city pop-ulation actually we find that the city population hassufficient discriminative power alone (about 75 accu-racy [18]) so that our geolocation criterion boils downinto picking the largest city in that disk Finally (e)we coalesce the disk to the classified city which reducesdisk overlap and allows iteration of the algorithm untilconvergence thus increasing the recall (ie number ofreplicas discovered) along each iteration

The analysis technique [17] is of course not an originalcontribution of this work whose main aims are insteadto scale up its application to an Internet-wide census onthe one hand (Sec 3) and to analyse and publish thegathered dataset on the other hand (Sec 4)

Characterization and Validation In addition toanycast detection the previous steps allow to geolocatethe replicas behind each anycast IP24 As outlined inFig 1 we validate the output of the geolocation stepwhenever a ground truth is available (as in Sec 34 forCDNs such as CloudFlare and Edgecast complemen-tary to the validation limited to DNS in [17])

Finally we provide a fine-grained characterization ofIP24 anycast services While our detection method-ology is service-agnostic we use nmap [38] on a list of

Figure 4 Anycast census at a glance typicalcensus magnitude

anycast IPs obtained from the census to reveal openports and the software their run Given that an exhaus-tive portscan (ie of the 216 TCP and UDP portspacesover all replicas of all anycast deployments) still incursa prohibitive measurement cost we restrict the mea-surements to TCP services (ie the most unexpectedones) and interesting deployments (ie deploymentswith large geographical footprints) We discuss any-cast services in Sec 43 finding over 10000 open portsthat map to about 500 well-known services and finger-printing some 30 software applications

Scale Completeness and Accuracy Although wedescribed the anycast geolocation technique in [17] wehad to overcome several challenges to run it at Internet-scale and within a short timespan for which we wentthrough multiple re-engineering phases We believe thata number of lesson learned (eg as the counter-intuitiveneed to slow-down the sending rate to complete a cen-sus) are worth sharing and discuss them in Sec 3

Notice that a large number of vantage points is re-quired to provide an accurate picture of anycast de-ployment especially in terms of the number of repli-cas discovered around the world Related work that fo-cuses on O(1) targets (ie DNS root-servers) indeed runmeasurement campaigns involving from O(104) [10] toO(105) [25] vantage points to achieve asymp90 recall [25]In our case given the sheer size O(107) of our target setwe tradeoff completeness for scale and possibly under-

estimate the number of IP-anycast replicas as we use amere O(102) vantage points

Still our results provide a broad conservative yetaccurate picture of Internet anycast usage for targetsfor which we have the ground truth our city-level ge-olocation is accurate in about 75 of the cases with amedian error of 350 Km otherwise (Sec 34)

Typical census In this work we perform four IPv4censuses and analyse the results obtained from theircombination For each of the O(102) VPs the magnitudeof a typical census is illustrated in Fig 4 starting froma hitlist of O(107) targets less than half send a reply(Sec 31) ICMP replies include O(105) errors some ofwhich relates to administratively prohibited communi-cation senders of these ICMP error messages are addedto a greylist to avoid probing them again in future cen-suses (Sec 33) Finally running the anycast geoloca-tion technique over the O(106) targets that generatevalid ICMP echo reply messages we discover roughlyO(103) IP24 anycast deployments corresponding toapproximately 01h of the whole IPv4 address space ndashthe proverbial needle in the IPv4 haystack

22 State of the artAnycast vs Unicast In standard IP unicast censusesthe set of targets can be split among VPs for scalabilityreasons In contrast in the anycast case all targetsshould be probed by all VPs to provide an accuratemap of geographical footprints Given that the numberof active VPs in PL is around 300 and that only oneIP32 target for each IP24 subnet needs to be probedin a given anycast census it follows that the raw amountof probe traffic is only slightly larger than that of anunicast censuses

In the unicast case the relative location of VPs withrespect to targets is irrelevant Therefore census strate-gies cover extremes such as a single centralized high-endserver capable of O(106) probes-per-second on a wellprovisioned 1 Gbps Ethernet connection as in Zmap [22]or a highly decentralized system that exploits O(105)low-resource gateways as in the illegal Carna Botnetcensus [1] Our system design sits halfway betweenthese two extremes in particular it uses an efficientapplication-level multi-threaded scanner capable of O(104)probessec distributed over O(102) PlanetLab nodes

It is also worth pointing out that an independentanalysis [33] of the Carna Botnet dataset found multiplecampaigns covering a cumulative number of probes ex-ceeding a full IPv4 census over a duration of 8 monthsAdditionally [33] suggests that due to an overlap in thetarget set not all hosts were probed neither during thetwo fast scan campaign identified nor during the wholemeasurement period As such our measurement cam-paign comprising multiple anycast censuses each lasting

only few hours constitutes an achievement per se

Geolocation and infrastructure mapping Uni-cast geolocation is a well investigated research topicNumerous techniques based on latency measurements[23 24 28] and databases [41 44] have been proposedYet database techniques are not only unreliable withunicast [41] but also with anycast since they adver-tise a single geolocation per IP Similarly latency-basedtechniques [23 24 28] use triangulation and geolocateunicast addresses at the intersection of multiple latencymeasurements from geographically dispersed vantage pointsHowever this assumption no longer necessarily holds foranycast as depicted in Fig 3

L7-anycast infrastructure mapping studies [1545] lever-age ECS requests to geolocate servers (millions of) re-quests are sent with different client IPs from one VP tounveil (thousands of) unicast IP addresses correspond-ing to PoPs of major OTTs However ECS supportis becoming widespread to enhance the user online ex-perience but is not yet pervasive Finally the tech-nique fails with alternative L7-anycast design relyingon HTTP redirection

To the best of our knowledge no IP-anycast geolo-cation technique exists other than our own as suchno other study apart from this work deals with any-cast infrastucture mapping With respect to ECS-basedtechnique for L7 anycast our technique allows to re-duce the cardinality of the problem without sacrificinggeolocation accuracy (a qualitative comparison is pro-vided in [17]) Additionally since BGP provides a uni-fied redirection technique IP-anycast offers an unprece-dented opportunity to broadly assess all deployments atonce

Anycast discovery and characterization Priorwork investigates different aspects related to anycastwith a focus on discovery and enumeration [25 35] oron characterization [9ndash1119323443] but to the bestof our knowledge not on anycast census Closest toours [17] is the work of [25 35] Specifically [17] isa service-agnostic technique for detection enumerationand geolocation Similarly to [17] speed of light vio-lation is used in [35] (however limited to detection andnot capable of enumerationgeolocation) Conversely[25] exploits DNS-specificities (ie CHAOS requests)to enumerate DNS replicas (but unlike [17] is neithercapable of geolocation nor applicable beyond DNS)

Other studies assess the performance of current IPanycast deployments with a focus on metrics such asproximity [910193443] affinity [9ndash11133443] avail-ability [103243] and load-balancing [1011] Yet thesestudies focus on DNS which is just a piece of the cur-rent anycast puzzle

Finally some work study IP-anycast CDN such as [16

27] However these focused studies add yet other usefulpieces to the puzzle that remained so far incompletelacking the broad perspective given by a Internet-widecoverage over all prefixes and services

3 SYSTEM DESIGNAnycast detection relies on measuring round trip de-

lays between a set of vantage points and a target IP ad-dress to uncover geo-inconsistencies Running an Inter-net census thus requires measurements towards millionsof destinations ideally in a short timeframe we nowdescribe and justify system design choices that allow usto perform multiple censuses that we analyze later inSec 4 Items discussed in this section concern the se-lection of targets (Sec 31) the measurement platform(Sec 32) and software (Sec 33) as well as the networkprotocol used (Sec 34) Finally we report considera-tions about the scalability of our workflow (Sec 35)

31 Census targetsCensus granularity Unlike multicast anycast ad-dresses need no reservation into the IP space as any IPaddress can be a candidate this makes deployment easybut the detection of anycast addresses hard Luckily toavoid a significant increase in the size of routing tablesBGP standard practice [4] is to ignore or block prefixesshorter that 24 Thus 24 is the minimum granular-ity for anycasted services which is a good granularityfor our census We validate this assumption with (spot)verifications for all IP addresses on some IP24 (belong-ing to EdgeCast) confirming any IP in the 24 to beequivalent for anycast detection purposes Additionallya 24 granularity implies that announced BGP prefixesthat are smaller than 24 are tested multiple times oneper each 24 they contain the mapping between 24and announced prefixes is still possible a posteriori aswe do in this work This choice is reinforced by [35]which found 88 of announced prefixes to be 24 statedthat ldquoanycast prefixes are dominated by 24rdquo and sug-gested that larger prefixes may be anycast only in partdue to BGP prefix aggregation We therefore fix thecensus granularity to a single target IP per 24

Target liveness As previously argued any alive IPbelonging to a 24 is equivalent in telling whether thewhole 24 is anycast (or unicast) To identify a respon-sive IP address in every 24-prefix we rely on the hitlistperiodically published by [31] The hitlist consists ingenerally one representative IP address for O(107) pre-fixes along with a score indicative of the host livelinesscomputed over several measurement campaigns Whenno alive IP has been observed in a 24 the hitlist con-tains an arbitrary address from that 24 (score leminus2)After covering the full hitlist with the first census weconfirm these hosts not being reachable and remove

Figure 5 Microsoft deployment as seen fromPlanetLab (21 replicas) vs RIPE (54 replicas)Notice that PlanetLab results (white markers)are a subset of RIPE (white and black markers)

them to reduce the target size to 66 middot 106 per VP

Coverage Given our census aim we verify how wellthis hitlist covers all routed 24 prefixes We thereforeobtain from CAIDA a dump of routing tables originat-ing from both RIPE RIS and RouteViews collectorsTo compare the hitlist vs the advertised prefixes wesplit the latter in 24 obtaining 10616435 24 pre-fixes of which 10615563 have a representative in thehitlist (over 9999 coverage) We additionally cross-check our observed target responsiveness with the ex-pected recall specifically recent ICMP scans [48] ob-serve 49 middot106 used 24 subnets and our campaigns simi-larly capture 44 middot106 responsive subnets (90 coveragewith respect to [48])

32 Measurement dataset vs platformDataset One option to avoid running a large scalemeasurement campaign is to exploit readily availabledatasets from public measurement infrastructures ndash yetwe could not find any fitting our purpose For instancedespite probing all 24 every 2-3 days Archipelago [6]clusters its vantage points into three independent groupseach using random IPs selected in each 24 prefix itfollows that at most 3 monitors target each 24 withgenerally different IP addresses and a hit rate of about6 Given the low hit-rate and low-parallelism suchdataset is not appropriate for our purpose as it wouldnot lead to a complete census nor to an accurate geolo-cation footprint even in case of hits

Platforms There are a number of available measure-ment platforms in the community each with its ownadvantages and limitations Except for illustration pur-poses in this paper we relied on PlanetLab (PL) WhileRIPE Atlas (RIPE for short) is more interesting for ge-ographical diversity due to its scale it has a limitedcontrol on the rate and type (cf Sec 34) of measure-

ments as well as their instantiation for such a largescale campaign (ie upload of the hitlist probing bud-get) Additionally the larger number of vantage pointswould mechanically increase about 20-fold the amountof probes per census with respect to PL (in case allVPs are used) Conversely measurement in PL are lim-ited by node availability (generally around 300 vantagepoints) but offer full flexibility for deploying customsoftware and run it at high speed (cf Sec 33)

While in this work we limitedly use PL we depictfor illustration purposes an application of our techniquefrom measurements collected from PlanetLab vs RIPEin Fig 5 as PL results are a subset of RIPE resultswhite markers indicate replicas found from both plat-forms while black markers pinpoint replicas that areonly found with RIPE measurements While this ex-ample has anecdotal relevance it suggests that an in-triguing direction is to combine both platforms eg byrefining via RIPE the geolocation of anycast 24 de-tected via PL

33 Measurement softwareFastping An efficient measurement tool is needed tomaximize the probing capacity of our VPs While atfirst sight Zmap [22] could seem the perfect tool forsuch large-scale campaign it however exhibits a majorblocking point in our setup namely Zmap generatesraw Ethernet frames which are very efficient in a localsetup but are not supported by the PlanetLab virtu-alization layer We therefore resort to Fastping [26] atool specialized as the name implies in ICMP scan-ning which is deployed on each PL node Fastping isable to send about O(104) probes per second ndash abouttwo orders of magnitude slower than Zmap but fasterthan the fastest nmap scripting engine scanner As wewill point out later concerning scalability (Sec 35) inorder to gather complete censuses in few hours we hadto undergo several rounds of re-engineering ndash includingpurposely slowing down Fastping sending-rate

Greylist Additionally Fastping adopts the usual tech-niques to be a good Internet measurement citizen ndash iea signature in the payload points to its homepage Fast-ping probes the target list in a randomized order toreduce intrusiveness and implements a greylist mecha-nism to honor requests to stop probing administrativelyprohibited hostsnetworks inferred from ICMP returncodes Before running a census from O(102) VP we ini-tially run a census from a single VP in order to buildan initial blacklist During any census we then collectaddresses generating ICMP return codes (other thanecho reply) in a temporary greylist that we later in-crementally merge with the the blacklist This list hasapproximately O(105) hosts with 985 added due toadministrative filtering [8] (type 3 code 13) and the re-

0

20

40

60

80

100

ICMP TCP-53 TCP-80 DNSUDP DNSTCP

L3 L4 L7

Resp

on

se r

ati

o [

]

OpenDNS Edgecast Cloudflare Microsoft

Figure 6 Response rates seen by heterogeneousprotocols across different targets

0

025

05

075

1

GTPAI TPR Error[Km]0

250

500

750

1000

GT

PA

I an

d T

PR

Med

ian

err

or

[Km

]CloudFlare EdgeCast

Figure 7 Validation with CloudFlare and Edge-Cast ASes Bars represent standard deviationamong IP24 of the AS

maining in reason of communications administrativelyprohibited at network or host levels (respectively 13code 10 [14] and 02 code 9 [14])

34 Network protocolRecall ICMP has often been used (and misused) inmeasurement studies especially given recent work show-ing that ICMP latency measurements are often not re-liable [40] we thus need to confirm the validity of ourprotocol selection A major motivation for ICMP mea-surement is given by the high recall it offers [48] Con-sider indeed that TCP and UDP measurements wouldneed an a priori knowledge (or guess) of services run-ning on the target under test We therefore perform atest on a reduced set of targets performing 100 mea-surements with different protocols specifically we con-sider network L3 (ICMP) and transport L4 (TCP SYN-SYNACK pair in the three-way handshake to port53 or 80) measurements as well as L7 (DNSUDP vsDNSTCP using dig) measurements Fig 6 shows thatprotocols other than ICMP have a binary recall inother words they work well only if the service is knowna priori Conversely ICMP is the only reliable alterna-tive yielding high recall across all deployments and isthus well suited for censuses

Accuracy While our technique relies on latency mea-surements it leverages the discriminative power of sidechannel information (ie cities population within disks)to cope with latency measurement noise While we vali-date the accuracy of the methodology for DNS in [17] avalidation for stateful TCP connections is still missingTo do so we build a ground truth (GT) for CloudFlareand EdgeCast by performing HTTP measurements withcurl from PL note that HTTP measurements are notavailable from RIPE highlighting once more the com-plementarity of these platforms

By inspection of the HTTP headers we find thatCloudFlare (EdgeCast) encode geolocation of the replicain the custom CF-RAY (standard Server) header fieldNotice that the measured GT constitutes the upper-bound of what can be possibly achieved from PL mea-surements while the publicly available information (PAI)displayed on the CloudFlare and EdgeCast websites con-tains a super-set of locations with respect to those mea-sured from PL We contrast true positive (TPR) clas-sification of our census vs HTTP GT in Fig 7 in 77of the IP24 for CloudFlare (65 for EdgeCast) thereis agreement at city level with a median error of 434Km (287 Km for EdgeCast) in the (relatively few) mis-classification cases As expected the low number of PLnodes possibly limits the portion of discoverable repli-cas (GTPAI is fairly high for CloufFlare but fairly lowfor EdgeCast) making our footprint estimates conser-vative and confirming the interest for alternative plat-forms such as RIPE

Consistency Additionally in the case of openDNSwe verify consistency across multiple RTT latency mea-surement techniques used early in Fig 6 In this case werely on public information that maps 24 locations [39]For all protocols applying [17] on the dataset yieldsbetween 15 and 17 instances Notice that all cities re-turned by the analysis are correct except Philadelphia(while the server is located in Ashburn at 260km or26ms worth of propagation delay away) this misclas-sification is due to the bias enforced in [17] toward citypopulation (Philadelphia is 33 times more populatedthan Ashburn) but as observed in [15] this is not prob-lematic as the ldquophysicalrdquo Ashburn location is actuallyserving the ldquologicalrdquo Philadelphia population

35 ScalabilityProbing rate When designing census experimentswe take care of avoiding obvious pitfalls For instancewhile we target a single host per 24 nevertheless weperform measurements from all PL nodes It followsthat each node must desynchronize to avoid hitting ICMPrate limiting (or raising alert) at the destination We doso by randomized permutation for target nodes achieved

Table 1 Textual (0) vs binary (1-4) censusesCensus ID Format Size (hosttotal) Analysis0 csv (270M 79G) gt3 days1-4 binary (21M 6G) 3 hr

0

02

04

06

08

1

1 2 4 8 16

CD

F

Completion time per VP [Hr]

Figure 8 CDF of per-vantage point completiontime over all censuses

via a Linear Feedback Shift Register (LFSR) with Ga-lois configuration Still while the LFSR solves rate lim-iting at the target it does not solve problems at thesource (or in the network) indeed while requests arewell spread replies do aggregate close to the VP thatreceives an aggregate rate equal to the probing rate ofFastping (in excess of 10000 hosts per second) In ourpreliminary (and incomplete) censuses we noted het-erogeneous (and possibly very high) drop rates for someVPs (likely tied to rate limiting spatially close to theVP) Given that the networks where PL machines arehosted are independently administered we opted for asimple solution and slowed down Fastping by one orderof magnitude1 that we verified empirically not trigger-ing the above problems Consequently probing 66 middot106

targets at 103 targets per second takes less than twohours as shown in Fig8 about 40 of PL nodes com-plete within this timeframe and 95 in under 5 hours(longer duration likely due to load on the PL host)

Output size and analysis duration A second scal-ability issue concerns the output format We initiallyoverlooked this issue and logged in textual format awealth of information amounting to 270M per node and80GB overall per census (cf Tab1) We therefore optedfor a radical reduction of the output size dumping astripped-down binary format containing a timestampdelay and ICMP flag (encoding greylist return codes 910 or 13 as a negative sign) for a total of about 20MBper node and 6GB overall per census

A third challenge lies in the analysis of the data Fora single target the running time of [17] is O(10minus1) secwhich compares very favorably to the O(103) sec of thebrute force optimal solution at the same time pro-cessing a census would still take days (we indeed have

1While it is possible to more finely tune the probingrate per VP however coverage may benefit from samplescoming from the slowest VP especially if it resides in ageographical area which is not well covered by PL

stopped processing the complete Census-0 after 3 daysof CPU time where textual format additionally led toslow processing due to disk fragmentation) Moreoverdue to LFSR the order of the target IPs in all filesis not the same meaning that an on-the-fly sorting ofabout 300 lists (one per VP) containing millions targetsis needed We therefore optimized our implementationwhich currently runs in under three hours ie aboutthe same timescale of the census duration so that inprinciple we could perform a continuous analysis Whilethis is not interesting for the anycast characterizationuse-case it may become relevant for other applicationsof this technique (eg BGP hijacking inference men-tioned in Sec 5)

4 ANYCAST 0 CENSUSESThis section presents results of the first Internet-wide

anycast study We start by aggregated statistics (Sec 41)and then incrementally refine the picture by providinga birdrsquos-eye view of the most interesting deployments(Sec 42) over which we perform an additional portscancampaign to reveal their running services (Sec 43)

41 At a glanceDetails about the of our censuses are reported in Fig 10Overall 1696 IP24 belonging to 346 ASes appear tohave more than one anycast replica while we were ableto find only 897 IP24 belonging to 100 ASes havingat least 5 replicas with our technique The plot alsoshows a geographical density map of anycast replicasresults of our censuses are available for browsing at [21]offering per-deployment (as in Fig 5) or aggregated (asin Fig 10) visualizations Notice that results reportedin this paper correspond to censuses performed duringMarch 2015 with later censuses we observed small butinteresting changes in the anycast landscape While weplan to run a continuous service please be advised that(at time of writing) results at [21] refer to the censusesdescribed in this paper

Several remarks from Fig 10 are in order First no-tice that our results are conservative since (i) in regionswith low presence of PlanetLab VPs we may miss someanycast replicas eg when the BGP prefix is only lo-cally advertised (ii) the analysis technique provides alower bound on the number of replicas since overlap-ping disks may correspond to different anycast replicasbut they will not be considered in the solution of theMIS problem (recall Fig 3) Second we investigate theCAIDA AS rank list to cross check how many ASes us-ing IP-anycast figure in the top-100 results tabulatedin Fig 10 show that 19 IP24 of 8 ASes that play acentral role in the Internet belong to the list Similarlywe investigate the Alexa rank to cross check how manywebpages in the top-100k rank are hosted2 by ASes us-

2For the sake of simplicity we resolve the domain name

IP24 ASes Cities CC ReplicasAll 1696 346 77 38 13802ge 5 Replicas 897 100 71 36 11598

cap CAIDA-100 19 8 30 18 138cap Alexa-100k 242 15 45 29 4038

Figure 10 Anycast censuses results at a glance

ing IP-anycast here again we find 242 IP24 of 15ASes that are among the major players of the Web

42 Top-100 Anycast ASesAlbeit the amount of anycast IP24 may seem de-

ceiving at first in reason of its exiguous footprint it isnevertheless very rich ndash revealing silver needles in thehaystack From the very coarse cross-check of CAIDAand Alexa ranks we already expect that anycast usageis not only restricted to DNS but rather covers impor-tant ISPs and OTTs Fig 9 presents a birdrsquos-eye viewof anycast adoption depicting several information forthe 100 ASes for which we detected at least 5 replicasidentified by their WHOIS name reported in the x-axis(capped to 12 characters) Geographical and IP24 foot-print are reported in the bottom ASes are arranged leftto right in decreasing number of replicas (bottom bar-plot with standard deviation across IP24 belongingto the same AS) additionally reporting the number ofanycast IP24 for that AS (middle bar-plot) Servicefootprint is correlated to the open TCP ports in the AS(middle scatter-plot) Next the relative importance ofthe AS in the Internet and for the Web are expressedin terms of the CAIDA and Alexa ranks respectively(top scatter-plots) Finally a label reported on the topx-axis categorize the main activity of the ASes from abusiness perspective (category is informal and in case ofASes with multiple services only the most prominent isselected)

Big fishes Major players of the Internet ecosystemare easy to spot in Fig 9 The list includes not onlytier-1 and other ISPs (such as ATampT Services Tinet

of the frontpage found in Alexa to an IP and disregardcontent that is referenced in the frontpage

0

10

20

1 C

LO

UD

FL

AR

EN

ET

2 IS

C-A

SU

S3 H

UR

RIC

AN

EU

S4 C

DN

ET

WO

RK

SU

S-

5 F

AC

EB

OO

KU

S6 C

OM

MU

NIT

YD

NS

7 X

GT

LD

US

8 L

-RO

OT

US

9 M

ICR

OS

OF

TU

S10 I-R

OO

TS

E11 V

ER

ISIG

N-I

NC

12 L

LN

WU

S13 A

RY

AK

A-A

RIN

14 A

PP

LE

-EN

GIN

E15 C

ED

EX

ISU

S16 H

IGH

WIN

DS

3U

17 N

ET

NO

D-I

XS

E18 O

PE

ND

NS

US

19 W

OO

DY

NE

T-1

U20 L

GT

LD

US

21 L

IEC

HT

EN

ST

EI

22 F

AS

TL

YU

S23 C

AC

HE

NE

TW

OR

K24 IN

ST

AR

TU

S25 D

NS

CA

ST

-AS

U26 G

OO

GL

EU

S27 E

DG

EC

AS

T-I

R

28 U

MD

NE

TU

S29 D

YN

DN

SU

S30 N

SO

NE

US

31 E

AS

YL

INK

4U

S32 Y

AH

OO

-AN

2U

S33 U

LT

RA

DN

SU

S34 O

VH

FR

35 L

IEC

HT

EN

ST

EI

36 A

S-A

FIL

IAS

1

37 A

UT

OM

AT

TIC

U38 T

INE

T-B

AC

KB

O39 A

BO

VE

NE

T-C

US

40 A

MA

ZO

N-0

2U

S41 C

WG

B42 L

EV

EL

3U

S43 E

DG

EC

AS

TU

S44 T

WIT

TE

R-N

ET

W45 IN

CA

PS

UL

AU

S46 A

GT

LD

US

47 A

US

RE

GIS

TR

Y-

48 C

EN

TR

AL

NIC

-A49 C

OG

EN

T-2

149

50 H

GT

LD

US

51 H

IGH

WIN

DS

4U

52 K

-RO

OT

-SE

RV

E53 N

ET

RIP

LE

X01

54 O

MN

ITU

RE

US

55 S

OF

TL

AY

ER

US

56 W

AN

GS

U-U

SU

S57 Y

AH

OO

-FC

US

58 B

ITG

RA

VIT

YU

59 A

BIL

EN

EU

S60 A

DV

AN

-CA

ST

U61 A

SA

TTL

DSE

62 A

S-Q

UA

DR

AN

ET

63 A

S6453U

S64 A

TT

EU

65 C

EN

TR

AL

NIC

-A66 C

EN

TU

RY

LIN

K-

67 C

ON

EX

IM-A

S-A

68 E

GT

LD

US

69 K

GT

LD

US

70 M

NS

-AS

NO

71 N

ICA

TA

T72 V

ITA

L-D

NS

US

73 W

HS

-AN

YC

AS

T-

74 Z

GT

LD

US

75 IN

TE

RN

AP

-BL

K76 N

ET

AP

P-A

NY

CA

77 S

PR

INT

LIN

KU

78 A

US

RE

GIS

TR

Y-

79 C

EN

TU

RY

LIN

K-

80 D

NS

IMP

LE

US

81 D

YN

-HC

US

82 E

AS

YL

INK

2U

S83 E

DN

SC

A84 E

SG

OB

-AN

YC

AS

85 H

OM

EP

L-A

SP

L86 L

INK

ED

INU

S87 M

AS

ER

GY

US

88 M

ED

IAM

AT

H-I

N89 M

II-2

GB

90 M

II-X

PC

US

91 P

EE

R1U

S92 P

HH

-AS

DE

93 P

RE

TE

CS

CA

94 P

RO

LE

XIC

US

95 Q

UA

NT

CA

ST

US

96 R

IMB

LA

CK

BE

RR

97 S

UP

ER

NE

TW

OR

K98 U

NO

VA

-1C

A99 V

OX

ILIT

YR

O100 Z

VO

NK

OV

A-A

S

R

ep

lica

s

1

10

100

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

IP

s2

4

5380

443

8080

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00Op

en

po

rts

100

1k

10k

100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00A

lexa

ra

nk

110

1001k

10k100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

CD

ND

NS

ISP

CD

NS

ocia

l N

etw

ork

DN

SD

NS

DN

SC

lou

dD

NS

DN

SC

DN

Clo

ud

CD

NS

ecu

rity

CD

ND

NS

Secu

rity

DN

SD

NS

DN

Su

nkn

ow

nC

DN

CD

NC

DN

DN

SC

lou

dD

NS

CD

Nu

nkn

ow

nD

NS

DN

SC

lou

d m

essag

ing

Web

Po

rtal

DN

SC

lou

du

nkn

ow

nD

NS

Blo

gg

ing

ISP

-tie

r1IS

PC

lou

dIS

PIS

P-t

ier1

CD

NS

ocia

l N

etw

ork

CD

ND

NS

DN

SD

NS

ISP

DN

SC

DN

DN

SD

NS

On

lin

e M

ark

eti

ng

Clo

ud

CD

NW

eb

Po

rtal

CD

NB

ackb

on

e N

etw

ork

un

kn

ow

nD

NS

Clo

ud

ISP

-tie

r1IS

PD

NS

ISP

-tie

r1C

lou

dD

NS

DN

SV

ideo

Co

nfe

ren

cin

gD

NS

DN

SS

ecu

rity

DN

SC

lou

dW

eb

An

aly

tics

ISP

-tie

r1D

NS

ISP

DN

SD

NS

Clo

ud

messag

ing

DN

SD

NS

Clo

ud

So

cia

l N

etw

ork

Clo

ud

AD

tech

no

log

yC

DN

CD

NC

lou

dC

DN

CD

NS

ecu

rity

Web

An

aly

tics

Tele

co

m V

en

do

rC

lou

dD

NS

Clo

ud

un

kn

ow

n

Ca

ida

ra

nk

Figure 9 Birdrsquos eye view of Top-100 anycast ASes (ranked according to geographical footprint)

Sprint TATA Communications Qwest Level 3 Hur-ricane Electrics) but also a rather large spectrum ofOTTs such as CDNs (eg CloudFlare EdgeCast) host-ing (eg OVH) and cloud providers (eg MicrosoftAmazon Web Services) social networks (eg TwitterFacebook LinkedIn) and security companies that pro-vide mitigation services against DDoS attacks (eg Pro-lexic OpenDNS) The list also includes manufactur-ers (eg Apple RIM) Web registrars (eg Verisignnicat) virtual roaming and virtual meeting services(Media Network Services) blogging platforms (Automat-tic a publishing company hosting wordPresscom) cloudmessaging (EASYLINK2 owned by ATampT Services)and web analytics (OMNITURE owned by Adobe Sys-tems) Of course DNS-related service providers suchas root and top-level domain servers (eg ISCF-rootCommunityDNS) DNS service management (eg Ul-traDNS DynDNS) and public DNS resolvers (eg GoogleDNS OpenDNS) also emerge in the censusDiversity We report a breakdown of AS classes inFig 11 crisply showing that DNS now represents aboutone third of IP anycast activities Plots in Fig 9 clearly

illustrate the large diversity of anycast usage under allmetrics Indeed no correlation appear between any twometrics illustrating the degree of freedom in anycast de-ployments for instance the geographical footprint andIP24 footprints are largely unrelated (Pearson corre-lation coefficient of 035) Additionally the number ofopen ports and the specific port values vary not onlyacross deployments having an heterogeneous businessmodel (eg we observe from a minimum of 1 openport for DNS to O(104) open ports for OVH) but alsobetween deployments of the same kind (eg CloudFlareand EdgeCast CDNs have in common only port 53 80and 443 over the set of 22 open ports they are usingwith CloudFlare using 4times more ports than EdgeCast)

Geographical footprint We specifically study themean number of geographical replicas per AS (bottomplot in Fig 9) championed by the CDN CloudFlare inour measurement Overall we observe that 25 ASeshave at least 10 replicas distributed around the globeNotice that these orders of magnitude are significantlysmaller with respect to L7-anycast deployments that

0

5

10

15

20

25

30

35

DNSCDN

CloudISP

UnknownSecurity

SocialOther

AS

bre

akd

ow

n [

]

Figure 11 Breakdown of AS category (only firstcategory is considered)

can exceed O(103) for the large providers which is inpart due to the low number of vantage points in Plan-etLab (see Sec32) Among those ASes we observe10 DNS service providers (including ISC DNScast andDynDNS) and 7 major CDNs (eg CloudeFlare Lime-light Highwinds Fastly CacheNetworks Instart LogicCDNetworks) We also discover two cloud providers(eg Microsoft and Aryaka Networks) one tier-1 ISP(Hurricane Electric which has 15 of ASes in its cus-tomer cone according to CAIDA) a security company(OpenDNS also popular for its public DNS service) asocial network (Facebook) and a manufacturer (Apple)

Fig 12 further reports the cumulative number of repli-cas per IP24 depicting both results coming from thecombination of censuses as well as individual resultfrom each census alone Specifically the MIS solverorders circles by increasing radius size intuitively thesmaller the latency the lower the number of overlapsthe better the recall of our method This is confirmedin Fig 12 where censuses are combined by computingthe minimum among multiple latency measurementsbetween the same VP and target pair to get an es-timate of the RTT latency that is as close as possibleto the propagation delay Additionally combining mea-surement increases recall about 200 more IP24 arefound to be anycast in the combination with respect tothe average individual census

In this paper we limitedly consider results from thecombination but remark that results are quite con-sistent across censuses (notice that curves overlap inFig 12) A last comment is worth making about de-ployments where we observe only 2 geographically dis-tributed replicas ndash which is possibly due to the low den-sity of our VPs but could also be tied to the wronggeolocation of some VP raising false positive replicasWhile we have anecdotal evidence of some of these ex-iguous deployments being anycast we prefer to defer amore detailed analysis for future work (see Sec 5)

0 01 02 03 04 05 06 07 08 09

1

2 5 10 15 20 25

CD

F o

f n

um

ber

of

IPs2

4

Number of replicas

Combination of censuses-308VPs(130315) Census 4-240VPs(040315) Census 3-269VPs(020315) Census 2-255VPs(270215) Census 1-261VPs

Figure 12 CDF of geographically distinct repli-cas per IP24 (individual censuses and overall)

0 01 02 03 04 05 06 07 08 09

1

1 10 100 1000

CLOUDFLARENETUS (328)

GOOGLEUS (102)

EDGECASTUS (37)

PROLEXICUS (21)

LINKEDINUS (1)

APPLEUS (6)

TWITTERUS (3)

LEVEL3US (2)C

DF

of

nu

mb

er

of

AS

es

Number of IPs24

gt= 5 Replicas

Figure 13 Number of IPs24 per AS

IP24 footprint In terms of the number of anycastIPs24 per AS (middle plot in Fig 9) we find thatthe CDN CloudFlare is by far the largest in terms ofIP address ranges Overall we find 10 ASes that haveat least 10 anycast IPs24 3 are CDNs (CloudFlareEdgeCast BitGravity) 3 are DNS providers (DNScastWoodyNet UltraDNS) and the remaining ASes rep-resent multiple services (Automattic Google AmazonWeb Services and Prolexic) The distribution of thenumber of IPs24 per AS depicted in Fig 13 shows thatabout half have exactly one IP24 (eg LinkedIn andATampT Services) Yet about 10 of the ASes employat least 10 subnets for instance Prolexic EdgeCastGoogle and CloudFlare employ 21 37 102 and 328anycast IP24 respectively

While in this work we do not provide a systematicinvestigation of the deployment density (ie how manyIP32 are alive in each IP24) from the above discus-sion about diversity is not surprising that we were ableto identify both very sparse (eg Google 8888 is theonly address alive in the 888024) and very dense de-ployments (eg well over 99 of IPs are alive in mostCloudFlare subnets)

Importance The presence of ASes ranking amongthe top-100 in the CAIDA list as well as CDNs servingcontent in the top-100k Alexa list are good indicatorsthat anycast is used for popular and important servicesConsidering CDNs that are after DNS the most popu-lar anycast service according to Fig 11 we observe that8 CDNs serve Alexa-100k websites this set includesCloudFlare EdgeCast and Fastly with 188 10 and5 websites respectively (in addition Highwinds Cach-eNetworks Instart Incapsula and BitGravity host onepopular site each) In addition 11 of the websites listedby Alexa are hosted by Google anycast IPs Finally10 websites are hosted on IPs that belong to Prolexic(now part of Akamai) which operates a DDoS mitiga-tion service that receives the traffic on behalf of its clientnetworks redirecting only legitimate traffic to them

43 Anycast ServicesPortscan campaign In reason of the historical use ofanycast for DNS services we believe it to be importantto provide an up-to-date longitudinal view across ser-vices offered via IP-anycast especially focusing on TCPWe provide a summary of the nmap probing in the topof Fig 14 We test all anycast 24 of the top-100 ASespicking a single IP representative per 24 we scan atlow rates all 216 TCP ports Our results are conser-vative in that different IPs in the same 24 may havedifferent open ports (which happens eg for Cloud-Flare and EdgeCast) and since an under-estimation ofthe number of open TCP ports can also be the result ofprobe filtering by firewalls and routers along the pathto the targets Out of the 897 IP of the top-100 ASeswe find that 816 of 81 ASes have at least one open TCPport The total number of distinct open TCP portsacross is 10485 providing 449 well-known services (ieas indicated by TCP port classification) 170 of whichover SSL Additionally nmap fingerprinting discovers30 different software implementations running on theanycast replicas that we also detail next

Class imbalance Given the heterogeneity of the IP24footprint we argue being necessary to consider only per-AS statistics to avoid presenting results that are biaseddue to class imbalance We illustrate the problem bydepicting in Fig 14 the frequency count of the top-10open TCP ports by number of ASes (top) and IPs24(bottom) Notice that only port 80 443 and 53 appearto be common to both top and bottom plots especiallyall ports in the hatched area are due to the large pre-dominance of IP24 owned by the CloudFlare AS whichalso affect the order of common ports in the top-10 Wethus focus on per AS statistics in the following

Stateful services Fig 15 presents the complemen-tary CDF of the number of open TCP ports per AS

nmap portscan statisticsIPs32 ASes Ports (SSL) Well Software

known812 81 10499 (185) 457 30

0

200

400

80

443

8080

53

2052

2053

2082

2083

8443

2087IP

24 f

req

uen

cy

IP2

4 f

req

uen

cy

0

200

400

htt

p

htt

p s

sl

htt

p

do

main

htt

p

htt

p

htt

p

htt

p

htt

p

htt

p

0

20

40

60

53

80

443

179

22

8080

8083

3306

1935

5252A

S f

req

uen

cy

AS

fre

qu

en

cy

0

20

40

60

do

main

htt

p

htt

p-s

sl

tcp

wra

pp

ed

ssh

htt

p

tcp

wra

pp

ed

tcp

wra

pp

ed

sip

mo

vaz-s

sc

Figure 14 Overall nmap portscan statistics andTop-10 open TCP ports (per AS and per 24)

We make the following observations (i) roughly half ofthe ASes have at least one open TCP port (ii) about10 of the ASes have at least 5 open TCP ports and(iii) the largest service footprint is represented by Incap-sula and especially OVH with 313 and 10148 open portsrespectively In the latter case while we did not inves-tigate thoroughly we suspect the large number of portsbeing due to the fact that OVH the largest hosting ser-vice in Europe and the 3rd in the world is significantlypopular in the BitTorrent seedbox ecosystem [42] Pre-dominant services (beyond DNS) include fairly popu-lar HTTP and HTTPS used by over 20 of the ASesEven excluding the OVH case the list of interestingservices is large In terms of business diversity 22 ASeshave at least 4 different TCP ports open 8 CDNs 4DNS 4 ISPs including a tier-1 ISP (Tinet SpA) andGoogle with 9 open TCP ports Finally interesting(though unpopular) services worth listing include multi-media services (RTMP Simplify Media MythTV) andgaming (Minecraft)

Software diversity Fig 16 lists 30 different soft-ware that we group into three main categories WebMail and DNS Interestingly the list includes opensource software such as popular web and DNS daemons(eg nginx ISC BIND) and proprietary software (egECAccECSECD which are web servers developed byEdgeCast) Starting with DNS software notice that for44 ASes using port 53 (out of 67) nmap could not iden-tify the software version running on the remote serverUnsurprisingly we find that ISC BIND is by far themost adopted protocol to handle DNS requests overanycast Yet we also detect the use by 3 ASes (AppleK-ROOT L-ROOT) of the NLnet Labs NSD implemen-

001

01

1

1 10 100 1000 10000 100000

OVHFR (10148)

INCAPSULAUS (330)

CLOUDFLARENETUS (20)

GOOGLEUS (9)

EDGECASTUS (5)C

CD

F

Number of open TCP ports

ASes

Figure 15 Complementary CDF of the numberof open TCP ports per AS

tation which is specifically designed to add resilienceagainst software failures of DNS root servers

Among web servers the most popular are nginx (7ASes) Apache httpd and lighttpd (ex aeligquo with 4ASes) We observe the use of proprietary web serversby some CDNs (eg cloudflare-nginx and Panel httpd)Though our dataset has a limited size we attempt acomparison with the relative popularity of webserversin the unicast world the Spearman correlation of popu-larity rank in our dataset with webserver ranks [2] in theAlexa-10M is low (038) As for the DNS case differencemay arise in some peculiar features that are especiallyvaluable in the anycast context Finally we detect thepresence of running daemons that serve mail on anycastIPs from Google (Gmail imapd Gmail pop3d gsmtp)as well as of RPC (ssh MicrosoftRPC) and databases(MySQLMicrosoft SQL)

5 DISCUSSIONWe present the first census(es) of IPv4 anycast de-

ployment gathered through an original and robust tech-nique implemented with an efficient and scalable systemdesign In spirit with the open source and data move-ment results of our census are available at [21]

Our characterization show that a tiny fraction of theIPv4 space is anycasted yet among the anycasters werecognize major players of the Internet ecosystem in-cluding top-ranking ISPs popular Cloud OTT and es-pecially CDN operators We additionally show greatheterogeneity along multiple directions and especiallyin terms of the offered services Particularly our portscancampaign of anycasted subnets reveals over 450 well-known services from over 10 000 unique open portsAdditionally we uncover 30 software implementationswith a relative breakdown that differs from softwareranking in the unicast IP world

Yet this work only scratches the surface and opensmore questions than it is able to answer as for instance

0

10

20

ISC

BIN

DN

Ln

et

Lab

s N

SD

Mic

roso

ft D

NS

Op

en

DN

S

ng

inx

lig

htt

pd

Ap

ach

e h

ttp

dE

CD

Mic

roso

ft H

TT

P

Mic

roso

ft IIS

Varn

ish

Ap

ach

e T

om

cat

bit

asic

v2

CF

S 0

213

clo

ud

flare

-ng

inx

cP

an

el h

ttp

dth

ttp

dE

CA

ccE

CS

Go

og

le h

ttp

din

sta

rt1

60

Gm

ail im

ap

dG

mail p

op

3d

Go

og

le g

sm

tp

Op

en

SS

HM

yS

QL

ssls

trip

Mic

roso

ft S

QL

M

icro

so

ft R

PC

DNS Web Mail Other

AS

fre

qu

en

cy

Figure 16 Breakdown of software running onanycast replicas

IP-level CDN Refining the active methodology bymapping content object (and not only frontpage) fromthe Alexa-100k would be needed to gather a better un-derstanding of IP-level CDNLongitudinal view Taking periodic censuses and an-alyzing the time evolution over longer timescales wouldallow to track evolution of IP anycast deploymentsTraffic volume A missing information concerns thetraffic volume served by IP anycast that can be gath-ered via passive measurement and annotated with re-sults of our census (ie binary flag per anycast IP24)Combine measurement platforms As we have seenit would be interesting to exploit multiple platforms inaddition to PlanetLab such as RIPE Atlas this wouldboth lead to a better characterization of large deploy-ments (eg increase the recall) as well as possibly assistin confirmingdiscarding suspicious deployments (iethose for which we detected 2 replicas from PL)BGP hijacking Detecting geo-inconsistencies for know-ingly unicast prefixes is symptomatic of BGP hijack-ing attacks being able to periodically and quickly scanthe network to raise alarms and cross-check them withother types of data (eg BGP feeds traceroute mea-surements) is a relevant extension of this work

AcknowledgementsWe thank our shepherd Matteo Varvello and the anony-mous CoNEXT reviewers for their valuable feedbackThis work has been carried out at LINCS (httpwwwlincsfr) The research leading to these results was sup-ported by the European Union under the FP7 GrantAgreement n 318627 (Integrated Project rdquomPlanerdquo)and by a Google Faculty Research Award

6 REFERENCES[1] httpinternetcensus2012bitbucketorgpaperhtml

[2] httpw3techscomtechnologiesoverviewweb serverall

[3] J Abley A software approach to distributing requestsfor DNS service using GNU Zebra ISC BIND 9FreeBSD In Proc USENIX ATEC 2004

[4] J Abley and K Lindqvist Operation of AnycastServices RFC 4786 (Best Current Practice) 2006

[5] V K Adhikari S Jain and Z li Zhang Youtubetraffic dynamics and its interplay with a tier-1 isp Anisp perspective In ACM IMC 2010

[6] httpwwwcaidaorgprojectsark

[7] httpwwwroot-serversorg

[8] F Baker Requirements for IP Version 4 RoutersIETF RFC 1812 1995

[9] H Ballani and P Francis Towards a global IP anycastservice In Proc ACM SIGCOMM 2005

[10] H Ballani P Francis and S Ratnasamy Ameasurement-based deployment proposal for IPanycast In ACM IMC 2006

[11] B Barber M Larson and M Kosters Traffic sourceanalysis of the J root anycast instances Nanog 2006

[12] I Bermudez S Traverso M Mellia and M MunafoExploring the cloud from passive measurements TheAmazon AWS case In Proc IEEE INFOCOM 2013

[13] P Boothe and R Bush DNS Anycast Stability SomeEarly Results CAIDA 2005

[14] R Braden Requirements for Internet Hosts -Communication Layers IETF RFC 1122 1989

[15] M Calder X Fan Z Hu E Katz-BassettJ Heidemann and R Govindan Mapping theexpansion of googlersquos serving infrastructure In ACMIMC 2013

[16] M Calder A Flavel E Katz-Bassett R Mahajanand J Padhye Analyzing the performance of ananycast cdn In ACM IMC 2015

[17] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman A fistful of pings Accurateand lightweight anycast enumeration and geolocationIn Proc IEEE INFOCOM 2015

[18] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman Latency-based anycastgeolocalization Algorithms software and datasets InTech Rep 2015

[19] L Colitti Measuring anycast server performance Thecase of K-root Nanog 2006

[20] C Contavalli W van der Gaast S Leach andE Lewis Client Subnet in DNS Queries httpstools

ietforghtmldraft-ietf-dnsop-edns-client-subnet-04

[21] httpwwwenstfr˜drossianycast

[22] Z Durumeric E Wustrow and J A HaldermanZmap Fast internet-wide scanning and its securityapplications In USENIX Security Symposium 2013

[23] B Eriksson P Barford J Sommers and R NowakA learning-based approach for IP geolocation InPAM 2010

[24] B Eriksson and M Crovella Understandinggeolocation accuracy using network geometry In ProcIEEE INFOCOM 2013

[25] X Fan J S Heidemann and R GovindanEvaluating anycast in the domain name system InProc IEEE INFOCOM 2013

[26] httpwwwict-mplaneeupublicfastping

[27] A Flavel P Mani D A Maltz N Holt J LiuY Chen and O Surmachev Fastroute A scalableload-aware anycast routing architecture for moderncdns In Proc USENIX NSDI 2015

[28] B Gueye A Ziviani M Crovella and S FdidaConstraint-based geolocation of internet hosts InACM IMC 2004

[29] T Hardie Distributing Authoritative Name Serversvia Shared Unicast Addresses IETF RFC 3258 2002

[30] J Heidemann Y Pradkin R GovindanC Papadopoulos G Bartlett and J BannisterCensus and survey of the visible internet In ACMIMC 2008

[31] Internet addresses hitlist dataset (20140829rev4338)Provided by the USCLANDER project(httpwwwisieduantlander)

[32] D Karrenberg Anycast and BGP stability A closerlook at DNSMON data Nanog 2005

[33] T Krenc O Hohlfeld and A Feldmann An internetcensus taken by an illegal botnet A qualitativeassessment of published measurements ACM CCR44(3) 2014

[34] Z Liu B Huffaker M Fomenkov N Brownlee andK C Claffy Two days in the life of the DNS anycastroot servers In PAM 2007

[35] D Madory C Cook and K Miao Who are theanycasters Nanog 2013

[36] B M Maggs and R K Sitaraman Algorithmicnuggets in content delivery SIGCOMM ComputCommun Rev 45(3) Jul 2015

[37] K Miller Deploying IP anycast Nanog 2003

[38] httpsnmaporg

[39] httpswwwopendnscomdata-center-locations

[40] C Pelsser L Cittadini S Vissicchio and R BushFrom paris to tokyo On the suitability of ping tomeasure latency In ACM IMC 2013

[41] I Poese S Uhlig M A Kaafar B Donnet andB Gueye IP geolocation databases Unreliable ACMCCR 41(2) 2011

[42] D Rossi G Pujol X Wang and F Mathieu Peekingthrough the BitTorrent seedbox hosting ecosystem InProc Traffic Monitoring and Analysis (TMA) 2014

[43] S Sarat V Pappas and A Terzis On the use ofanycast in DNS In Proc ACM SIGMETRICS 2005

[44] Y Shavitt and N Zilberman A geolocation databasesstudy IEEE J-SAC 29(10) 2011

[45] F Streibelt J Bottger N Chatzis G Smaragdakisand A Feldmann Exploring edns-client-subnetadopters in your free time In ACM IMC 2013

[46] A Su D R Choffnes A Kuzmanovic andF Bustamante Drafting behind akamai(travelocity-based detouring) In Proc ACMSIGCOMM 2006

[47] R Torres A Finamore J R Kim M Mellia M MMunafo and S Rao Dissecting video server selectionstrategies in the YouTube CDN In Proc of IEEEICDCS 2011

[48] S Zander L L Andrew and G Armitage Capturingghosts Predicting the used ipv4 space by inferringunobserved addresses In ACM IMC 2014

  • Introduction
  • Methodology Overview
    • Workflow
    • State of the art
      • System design
        • Census targets
        • Measurement dataset vs platform
        • Measurement software
        • Network protocol
        • Scalability
          • Anycast 0 censuses
            • At a glance
            • Top-100 Anycast ASes
            • Anycast Services
              • Discussion
              • References
Page 2: Characterizing IPv4 Anycast Adoption and Deployment€¦ · Characterizing IPv4 Anycast Adoption and Deployment ... CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM

cover the geographical footprints of major CDNs [1545] Still given the wide design space and flexibilityin L7 anycast implementations it is hard to generalizeresults of L7 anycast usage performance and geograph-ical deployments across CDNs

Conversely most IP anycast studies [93443] are lim-ited to DNS which has historically been the killer appli-cation of IP anycast [372937] This paper shows thatthe usage of IP anycast has significantly changed in re-cent years In particular major players of the Internetecosystem including Internet service providers (ISPs)OTTs and manufacturers provide a diversity of ser-vices with IP anycast (eg content distribution cloudservices web hosting web acceleration DDoS protec-tion) Yet missing an Internet-scale study of IP anycastdeployment the scientific community is not up-to-datewith such changes and knowledge related to non-DNSanycast (eg anycast IP address ranges the numberof replicas behind each address services provided) isanecdotal at best which motivates our current work

Indeed while valuable research efforts (Sec 2) startedwith seminal work such as [30] and culminated with [122] more recently focus on unicast censuses this paperpresents the first census of the use of IPv4 anycast inthe Internet First we describe the challenges faced indesigning a system able to collect and analyze Internet-scale delay measurements [17] in a short time frame(Sec 3) Next we discuss the results of a thorough ex-perimental campaign analyzing anycast adoption overmultiple anycast IPv4 censuses (Sec 4) To summarizeour main contributions

bull We conduct and combine delay measurements fromfour full censuses based on which we find aboutO(103) IP24 subnets to be anycasted

bull We characterize the geographical footprint of IPanycast deployments that we (conservatively) findon average to have O(10) replicas

bull We provide empirical evidence that IP anycast isused by ASes in the CAIDA top-10 rank and byASes serving content over HTTP and HTTPS forwebsites in the Alexa top-100 rank

bull We show that anycast is used to serve a large di-versity of stateful services (a complementary port-scan finds 10000 open port about 500 of whichare well known) running on top of TCP

bull We describe our distributed system design able toperform and analyze one census in under 5 hours

bull We make our census results browsable at [21]

2 METHODOLOGY OVERVIEWIn this section we present an overview of our work

(Sec 21) and put it in perspective with prior research

Greylist

Unicast

DeteconEnumeraonGeolocaon

Validaon

Blacklist

IPv4 hitlist PlanetLab

Analysis [16]

Anycast(ICMPTCPUDPRTT Latency)

Measurement

Characterizaon

Ground truth

nmap portscan

Figure 1 Overall workflow of the anycast census

Figure 2 Analysis technique anycast detection(figure adapted from [17])

efforts (Sec 22) For the sake of readability we describeour complete workflow with the help of Fig 1

21 WorkflowMeasurements We use a distributed software run-ning over PlanetLab (PL) to conduct IPv4 anycast cen-suses with ICMP latency measurements Each of theO(102) PL vantage points (VP) receives a set of O(107)IP32 targets (namely the IPv4 hitlist provided by [31])We consider that each IP32 in this hitlist is repre-sentative of the corresponding IP24 subnet and thuscover the entire IPv4 address space (we validate thisassumption in Sec 31) Later on we thoroughly jus-tify our choices of the measurement platform (eg PLover RIPE MLab Archipelago etc in Sec 32) soft-ware (eg fastpingTDMI over Zmap in Sec 33) andprotocols (eg ICMP over TCP or UDP in Sec 34)

Analysis The dataset collected from the census isuploaded to a central repository We run an iterativealgorithm that we recently proposed [17] to detect enu-merate and geolocate anycast replicas over the datasetFor the sake of completeness we provide an overview ofthe technique which is based on detection of speed-of-light violations [35] as depicted in Fig 2 the main ideais that in case latency measurements from two vantagepoints toward the same target exhibit geo-inconsistencythen it is safe to assume the target to be anycast

We illustrate an execution of the technique in Fig 3Briefly given a specific target IP (a) we first map eachRTT latency measurement to a disk centered around theVP that by definition contains the target contacted(b)if two such disks do not intersect as just discussed wecan infer that VPs are contacting two different replicas

(a) Measure-ment Map RTTsamples to diskscentered aroundVPs

(b) Detect Non-overlapping disksimply speed-of-light violation

(c) Enumerate Solving a Maxi-mum Independent Set (MIS) prob-lem yields non-overlapping disk eachcontaining a different replica (twosteps shown)

(d) GeolocateMaximum likeli-hood classificationproblem (city-level)

(e) Iterate Col-lapse disks aroundgeolocated repli-cas until conver-gence

Figure 3 Illustration of the Analysis technique enumeration and geolocation steps)

as is the case for the green discs in Fig 3(b) While ob-servation of inconsistency among any pair lead to any-cast detection leveraging multiple observations it is pos-sible to further enumerate such replicas Enumerationis described in step (c) to provide a conservative esti-mation of the minimum number of anycast replicas wesolve a Maximum Independent Set (MIS) problem MISoutputs a set of non-overlapping disks which contain adifferent replica of the same target while MIS prob-lem is NP-Hard we solve it using a 5-approximationalgorithm that greedily operates on disks of increasingradius size as in Fig 3(c) and that in practice yieldresults that are very close to the optimum providedby a prohibitively more costly brute force solution [17]Geolocation happens in step (d) in the smallest diskwe geolocate the replica at city-level granularity with amaximum likelihood estimator biased toward city pop-ulation actually we find that the city population hassufficient discriminative power alone (about 75 accu-racy [18]) so that our geolocation criterion boils downinto picking the largest city in that disk Finally (e)we coalesce the disk to the classified city which reducesdisk overlap and allows iteration of the algorithm untilconvergence thus increasing the recall (ie number ofreplicas discovered) along each iteration

The analysis technique [17] is of course not an originalcontribution of this work whose main aims are insteadto scale up its application to an Internet-wide census onthe one hand (Sec 3) and to analyse and publish thegathered dataset on the other hand (Sec 4)

Characterization and Validation In addition toanycast detection the previous steps allow to geolocatethe replicas behind each anycast IP24 As outlined inFig 1 we validate the output of the geolocation stepwhenever a ground truth is available (as in Sec 34 forCDNs such as CloudFlare and Edgecast complemen-tary to the validation limited to DNS in [17])

Finally we provide a fine-grained characterization ofIP24 anycast services While our detection method-ology is service-agnostic we use nmap [38] on a list of

Figure 4 Anycast census at a glance typicalcensus magnitude

anycast IPs obtained from the census to reveal openports and the software their run Given that an exhaus-tive portscan (ie of the 216 TCP and UDP portspacesover all replicas of all anycast deployments) still incursa prohibitive measurement cost we restrict the mea-surements to TCP services (ie the most unexpectedones) and interesting deployments (ie deploymentswith large geographical footprints) We discuss any-cast services in Sec 43 finding over 10000 open portsthat map to about 500 well-known services and finger-printing some 30 software applications

Scale Completeness and Accuracy Although wedescribed the anycast geolocation technique in [17] wehad to overcome several challenges to run it at Internet-scale and within a short timespan for which we wentthrough multiple re-engineering phases We believe thata number of lesson learned (eg as the counter-intuitiveneed to slow-down the sending rate to complete a cen-sus) are worth sharing and discuss them in Sec 3

Notice that a large number of vantage points is re-quired to provide an accurate picture of anycast de-ployment especially in terms of the number of repli-cas discovered around the world Related work that fo-cuses on O(1) targets (ie DNS root-servers) indeed runmeasurement campaigns involving from O(104) [10] toO(105) [25] vantage points to achieve asymp90 recall [25]In our case given the sheer size O(107) of our target setwe tradeoff completeness for scale and possibly under-

estimate the number of IP-anycast replicas as we use amere O(102) vantage points

Still our results provide a broad conservative yetaccurate picture of Internet anycast usage for targetsfor which we have the ground truth our city-level ge-olocation is accurate in about 75 of the cases with amedian error of 350 Km otherwise (Sec 34)

Typical census In this work we perform four IPv4censuses and analyse the results obtained from theircombination For each of the O(102) VPs the magnitudeof a typical census is illustrated in Fig 4 starting froma hitlist of O(107) targets less than half send a reply(Sec 31) ICMP replies include O(105) errors some ofwhich relates to administratively prohibited communi-cation senders of these ICMP error messages are addedto a greylist to avoid probing them again in future cen-suses (Sec 33) Finally running the anycast geoloca-tion technique over the O(106) targets that generatevalid ICMP echo reply messages we discover roughlyO(103) IP24 anycast deployments corresponding toapproximately 01h of the whole IPv4 address space ndashthe proverbial needle in the IPv4 haystack

22 State of the artAnycast vs Unicast In standard IP unicast censusesthe set of targets can be split among VPs for scalabilityreasons In contrast in the anycast case all targetsshould be probed by all VPs to provide an accuratemap of geographical footprints Given that the numberof active VPs in PL is around 300 and that only oneIP32 target for each IP24 subnet needs to be probedin a given anycast census it follows that the raw amountof probe traffic is only slightly larger than that of anunicast censuses

In the unicast case the relative location of VPs withrespect to targets is irrelevant Therefore census strate-gies cover extremes such as a single centralized high-endserver capable of O(106) probes-per-second on a wellprovisioned 1 Gbps Ethernet connection as in Zmap [22]or a highly decentralized system that exploits O(105)low-resource gateways as in the illegal Carna Botnetcensus [1] Our system design sits halfway betweenthese two extremes in particular it uses an efficientapplication-level multi-threaded scanner capable of O(104)probessec distributed over O(102) PlanetLab nodes

It is also worth pointing out that an independentanalysis [33] of the Carna Botnet dataset found multiplecampaigns covering a cumulative number of probes ex-ceeding a full IPv4 census over a duration of 8 monthsAdditionally [33] suggests that due to an overlap in thetarget set not all hosts were probed neither during thetwo fast scan campaign identified nor during the wholemeasurement period As such our measurement cam-paign comprising multiple anycast censuses each lasting

only few hours constitutes an achievement per se

Geolocation and infrastructure mapping Uni-cast geolocation is a well investigated research topicNumerous techniques based on latency measurements[23 24 28] and databases [41 44] have been proposedYet database techniques are not only unreliable withunicast [41] but also with anycast since they adver-tise a single geolocation per IP Similarly latency-basedtechniques [23 24 28] use triangulation and geolocateunicast addresses at the intersection of multiple latencymeasurements from geographically dispersed vantage pointsHowever this assumption no longer necessarily holds foranycast as depicted in Fig 3

L7-anycast infrastructure mapping studies [1545] lever-age ECS requests to geolocate servers (millions of) re-quests are sent with different client IPs from one VP tounveil (thousands of) unicast IP addresses correspond-ing to PoPs of major OTTs However ECS supportis becoming widespread to enhance the user online ex-perience but is not yet pervasive Finally the tech-nique fails with alternative L7-anycast design relyingon HTTP redirection

To the best of our knowledge no IP-anycast geolo-cation technique exists other than our own as suchno other study apart from this work deals with any-cast infrastucture mapping With respect to ECS-basedtechnique for L7 anycast our technique allows to re-duce the cardinality of the problem without sacrificinggeolocation accuracy (a qualitative comparison is pro-vided in [17]) Additionally since BGP provides a uni-fied redirection technique IP-anycast offers an unprece-dented opportunity to broadly assess all deployments atonce

Anycast discovery and characterization Priorwork investigates different aspects related to anycastwith a focus on discovery and enumeration [25 35] oron characterization [9ndash1119323443] but to the bestof our knowledge not on anycast census Closest toours [17] is the work of [25 35] Specifically [17] isa service-agnostic technique for detection enumerationand geolocation Similarly to [17] speed of light vio-lation is used in [35] (however limited to detection andnot capable of enumerationgeolocation) Conversely[25] exploits DNS-specificities (ie CHAOS requests)to enumerate DNS replicas (but unlike [17] is neithercapable of geolocation nor applicable beyond DNS)

Other studies assess the performance of current IPanycast deployments with a focus on metrics such asproximity [910193443] affinity [9ndash11133443] avail-ability [103243] and load-balancing [1011] Yet thesestudies focus on DNS which is just a piece of the cur-rent anycast puzzle

Finally some work study IP-anycast CDN such as [16

27] However these focused studies add yet other usefulpieces to the puzzle that remained so far incompletelacking the broad perspective given by a Internet-widecoverage over all prefixes and services

3 SYSTEM DESIGNAnycast detection relies on measuring round trip de-

lays between a set of vantage points and a target IP ad-dress to uncover geo-inconsistencies Running an Inter-net census thus requires measurements towards millionsof destinations ideally in a short timeframe we nowdescribe and justify system design choices that allow usto perform multiple censuses that we analyze later inSec 4 Items discussed in this section concern the se-lection of targets (Sec 31) the measurement platform(Sec 32) and software (Sec 33) as well as the networkprotocol used (Sec 34) Finally we report considera-tions about the scalability of our workflow (Sec 35)

31 Census targetsCensus granularity Unlike multicast anycast ad-dresses need no reservation into the IP space as any IPaddress can be a candidate this makes deployment easybut the detection of anycast addresses hard Luckily toavoid a significant increase in the size of routing tablesBGP standard practice [4] is to ignore or block prefixesshorter that 24 Thus 24 is the minimum granular-ity for anycasted services which is a good granularityfor our census We validate this assumption with (spot)verifications for all IP addresses on some IP24 (belong-ing to EdgeCast) confirming any IP in the 24 to beequivalent for anycast detection purposes Additionallya 24 granularity implies that announced BGP prefixesthat are smaller than 24 are tested multiple times oneper each 24 they contain the mapping between 24and announced prefixes is still possible a posteriori aswe do in this work This choice is reinforced by [35]which found 88 of announced prefixes to be 24 statedthat ldquoanycast prefixes are dominated by 24rdquo and sug-gested that larger prefixes may be anycast only in partdue to BGP prefix aggregation We therefore fix thecensus granularity to a single target IP per 24

Target liveness As previously argued any alive IPbelonging to a 24 is equivalent in telling whether thewhole 24 is anycast (or unicast) To identify a respon-sive IP address in every 24-prefix we rely on the hitlistperiodically published by [31] The hitlist consists ingenerally one representative IP address for O(107) pre-fixes along with a score indicative of the host livelinesscomputed over several measurement campaigns Whenno alive IP has been observed in a 24 the hitlist con-tains an arbitrary address from that 24 (score leminus2)After covering the full hitlist with the first census weconfirm these hosts not being reachable and remove

Figure 5 Microsoft deployment as seen fromPlanetLab (21 replicas) vs RIPE (54 replicas)Notice that PlanetLab results (white markers)are a subset of RIPE (white and black markers)

them to reduce the target size to 66 middot 106 per VP

Coverage Given our census aim we verify how wellthis hitlist covers all routed 24 prefixes We thereforeobtain from CAIDA a dump of routing tables originat-ing from both RIPE RIS and RouteViews collectorsTo compare the hitlist vs the advertised prefixes wesplit the latter in 24 obtaining 10616435 24 pre-fixes of which 10615563 have a representative in thehitlist (over 9999 coverage) We additionally cross-check our observed target responsiveness with the ex-pected recall specifically recent ICMP scans [48] ob-serve 49 middot106 used 24 subnets and our campaigns simi-larly capture 44 middot106 responsive subnets (90 coveragewith respect to [48])

32 Measurement dataset vs platformDataset One option to avoid running a large scalemeasurement campaign is to exploit readily availabledatasets from public measurement infrastructures ndash yetwe could not find any fitting our purpose For instancedespite probing all 24 every 2-3 days Archipelago [6]clusters its vantage points into three independent groupseach using random IPs selected in each 24 prefix itfollows that at most 3 monitors target each 24 withgenerally different IP addresses and a hit rate of about6 Given the low hit-rate and low-parallelism suchdataset is not appropriate for our purpose as it wouldnot lead to a complete census nor to an accurate geolo-cation footprint even in case of hits

Platforms There are a number of available measure-ment platforms in the community each with its ownadvantages and limitations Except for illustration pur-poses in this paper we relied on PlanetLab (PL) WhileRIPE Atlas (RIPE for short) is more interesting for ge-ographical diversity due to its scale it has a limitedcontrol on the rate and type (cf Sec 34) of measure-

ments as well as their instantiation for such a largescale campaign (ie upload of the hitlist probing bud-get) Additionally the larger number of vantage pointswould mechanically increase about 20-fold the amountof probes per census with respect to PL (in case allVPs are used) Conversely measurement in PL are lim-ited by node availability (generally around 300 vantagepoints) but offer full flexibility for deploying customsoftware and run it at high speed (cf Sec 33)

While in this work we limitedly use PL we depictfor illustration purposes an application of our techniquefrom measurements collected from PlanetLab vs RIPEin Fig 5 as PL results are a subset of RIPE resultswhite markers indicate replicas found from both plat-forms while black markers pinpoint replicas that areonly found with RIPE measurements While this ex-ample has anecdotal relevance it suggests that an in-triguing direction is to combine both platforms eg byrefining via RIPE the geolocation of anycast 24 de-tected via PL

33 Measurement softwareFastping An efficient measurement tool is needed tomaximize the probing capacity of our VPs While atfirst sight Zmap [22] could seem the perfect tool forsuch large-scale campaign it however exhibits a majorblocking point in our setup namely Zmap generatesraw Ethernet frames which are very efficient in a localsetup but are not supported by the PlanetLab virtu-alization layer We therefore resort to Fastping [26] atool specialized as the name implies in ICMP scan-ning which is deployed on each PL node Fastping isable to send about O(104) probes per second ndash abouttwo orders of magnitude slower than Zmap but fasterthan the fastest nmap scripting engine scanner As wewill point out later concerning scalability (Sec 35) inorder to gather complete censuses in few hours we hadto undergo several rounds of re-engineering ndash includingpurposely slowing down Fastping sending-rate

Greylist Additionally Fastping adopts the usual tech-niques to be a good Internet measurement citizen ndash iea signature in the payload points to its homepage Fast-ping probes the target list in a randomized order toreduce intrusiveness and implements a greylist mecha-nism to honor requests to stop probing administrativelyprohibited hostsnetworks inferred from ICMP returncodes Before running a census from O(102) VP we ini-tially run a census from a single VP in order to buildan initial blacklist During any census we then collectaddresses generating ICMP return codes (other thanecho reply) in a temporary greylist that we later in-crementally merge with the the blacklist This list hasapproximately O(105) hosts with 985 added due toadministrative filtering [8] (type 3 code 13) and the re-

0

20

40

60

80

100

ICMP TCP-53 TCP-80 DNSUDP DNSTCP

L3 L4 L7

Resp

on

se r

ati

o [

]

OpenDNS Edgecast Cloudflare Microsoft

Figure 6 Response rates seen by heterogeneousprotocols across different targets

0

025

05

075

1

GTPAI TPR Error[Km]0

250

500

750

1000

GT

PA

I an

d T

PR

Med

ian

err

or

[Km

]CloudFlare EdgeCast

Figure 7 Validation with CloudFlare and Edge-Cast ASes Bars represent standard deviationamong IP24 of the AS

maining in reason of communications administrativelyprohibited at network or host levels (respectively 13code 10 [14] and 02 code 9 [14])

34 Network protocolRecall ICMP has often been used (and misused) inmeasurement studies especially given recent work show-ing that ICMP latency measurements are often not re-liable [40] we thus need to confirm the validity of ourprotocol selection A major motivation for ICMP mea-surement is given by the high recall it offers [48] Con-sider indeed that TCP and UDP measurements wouldneed an a priori knowledge (or guess) of services run-ning on the target under test We therefore perform atest on a reduced set of targets performing 100 mea-surements with different protocols specifically we con-sider network L3 (ICMP) and transport L4 (TCP SYN-SYNACK pair in the three-way handshake to port53 or 80) measurements as well as L7 (DNSUDP vsDNSTCP using dig) measurements Fig 6 shows thatprotocols other than ICMP have a binary recall inother words they work well only if the service is knowna priori Conversely ICMP is the only reliable alterna-tive yielding high recall across all deployments and isthus well suited for censuses

Accuracy While our technique relies on latency mea-surements it leverages the discriminative power of sidechannel information (ie cities population within disks)to cope with latency measurement noise While we vali-date the accuracy of the methodology for DNS in [17] avalidation for stateful TCP connections is still missingTo do so we build a ground truth (GT) for CloudFlareand EdgeCast by performing HTTP measurements withcurl from PL note that HTTP measurements are notavailable from RIPE highlighting once more the com-plementarity of these platforms

By inspection of the HTTP headers we find thatCloudFlare (EdgeCast) encode geolocation of the replicain the custom CF-RAY (standard Server) header fieldNotice that the measured GT constitutes the upper-bound of what can be possibly achieved from PL mea-surements while the publicly available information (PAI)displayed on the CloudFlare and EdgeCast websites con-tains a super-set of locations with respect to those mea-sured from PL We contrast true positive (TPR) clas-sification of our census vs HTTP GT in Fig 7 in 77of the IP24 for CloudFlare (65 for EdgeCast) thereis agreement at city level with a median error of 434Km (287 Km for EdgeCast) in the (relatively few) mis-classification cases As expected the low number of PLnodes possibly limits the portion of discoverable repli-cas (GTPAI is fairly high for CloufFlare but fairly lowfor EdgeCast) making our footprint estimates conser-vative and confirming the interest for alternative plat-forms such as RIPE

Consistency Additionally in the case of openDNSwe verify consistency across multiple RTT latency mea-surement techniques used early in Fig 6 In this case werely on public information that maps 24 locations [39]For all protocols applying [17] on the dataset yieldsbetween 15 and 17 instances Notice that all cities re-turned by the analysis are correct except Philadelphia(while the server is located in Ashburn at 260km or26ms worth of propagation delay away) this misclas-sification is due to the bias enforced in [17] toward citypopulation (Philadelphia is 33 times more populatedthan Ashburn) but as observed in [15] this is not prob-lematic as the ldquophysicalrdquo Ashburn location is actuallyserving the ldquologicalrdquo Philadelphia population

35 ScalabilityProbing rate When designing census experimentswe take care of avoiding obvious pitfalls For instancewhile we target a single host per 24 nevertheless weperform measurements from all PL nodes It followsthat each node must desynchronize to avoid hitting ICMPrate limiting (or raising alert) at the destination We doso by randomized permutation for target nodes achieved

Table 1 Textual (0) vs binary (1-4) censusesCensus ID Format Size (hosttotal) Analysis0 csv (270M 79G) gt3 days1-4 binary (21M 6G) 3 hr

0

02

04

06

08

1

1 2 4 8 16

CD

F

Completion time per VP [Hr]

Figure 8 CDF of per-vantage point completiontime over all censuses

via a Linear Feedback Shift Register (LFSR) with Ga-lois configuration Still while the LFSR solves rate lim-iting at the target it does not solve problems at thesource (or in the network) indeed while requests arewell spread replies do aggregate close to the VP thatreceives an aggregate rate equal to the probing rate ofFastping (in excess of 10000 hosts per second) In ourpreliminary (and incomplete) censuses we noted het-erogeneous (and possibly very high) drop rates for someVPs (likely tied to rate limiting spatially close to theVP) Given that the networks where PL machines arehosted are independently administered we opted for asimple solution and slowed down Fastping by one orderof magnitude1 that we verified empirically not trigger-ing the above problems Consequently probing 66 middot106

targets at 103 targets per second takes less than twohours as shown in Fig8 about 40 of PL nodes com-plete within this timeframe and 95 in under 5 hours(longer duration likely due to load on the PL host)

Output size and analysis duration A second scal-ability issue concerns the output format We initiallyoverlooked this issue and logged in textual format awealth of information amounting to 270M per node and80GB overall per census (cf Tab1) We therefore optedfor a radical reduction of the output size dumping astripped-down binary format containing a timestampdelay and ICMP flag (encoding greylist return codes 910 or 13 as a negative sign) for a total of about 20MBper node and 6GB overall per census

A third challenge lies in the analysis of the data Fora single target the running time of [17] is O(10minus1) secwhich compares very favorably to the O(103) sec of thebrute force optimal solution at the same time pro-cessing a census would still take days (we indeed have

1While it is possible to more finely tune the probingrate per VP however coverage may benefit from samplescoming from the slowest VP especially if it resides in ageographical area which is not well covered by PL

stopped processing the complete Census-0 after 3 daysof CPU time where textual format additionally led toslow processing due to disk fragmentation) Moreoverdue to LFSR the order of the target IPs in all filesis not the same meaning that an on-the-fly sorting ofabout 300 lists (one per VP) containing millions targetsis needed We therefore optimized our implementationwhich currently runs in under three hours ie aboutthe same timescale of the census duration so that inprinciple we could perform a continuous analysis Whilethis is not interesting for the anycast characterizationuse-case it may become relevant for other applicationsof this technique (eg BGP hijacking inference men-tioned in Sec 5)

4 ANYCAST 0 CENSUSESThis section presents results of the first Internet-wide

anycast study We start by aggregated statistics (Sec 41)and then incrementally refine the picture by providinga birdrsquos-eye view of the most interesting deployments(Sec 42) over which we perform an additional portscancampaign to reveal their running services (Sec 43)

41 At a glanceDetails about the of our censuses are reported in Fig 10Overall 1696 IP24 belonging to 346 ASes appear tohave more than one anycast replica while we were ableto find only 897 IP24 belonging to 100 ASes havingat least 5 replicas with our technique The plot alsoshows a geographical density map of anycast replicasresults of our censuses are available for browsing at [21]offering per-deployment (as in Fig 5) or aggregated (asin Fig 10) visualizations Notice that results reportedin this paper correspond to censuses performed duringMarch 2015 with later censuses we observed small butinteresting changes in the anycast landscape While weplan to run a continuous service please be advised that(at time of writing) results at [21] refer to the censusesdescribed in this paper

Several remarks from Fig 10 are in order First no-tice that our results are conservative since (i) in regionswith low presence of PlanetLab VPs we may miss someanycast replicas eg when the BGP prefix is only lo-cally advertised (ii) the analysis technique provides alower bound on the number of replicas since overlap-ping disks may correspond to different anycast replicasbut they will not be considered in the solution of theMIS problem (recall Fig 3) Second we investigate theCAIDA AS rank list to cross check how many ASes us-ing IP-anycast figure in the top-100 results tabulatedin Fig 10 show that 19 IP24 of 8 ASes that play acentral role in the Internet belong to the list Similarlywe investigate the Alexa rank to cross check how manywebpages in the top-100k rank are hosted2 by ASes us-

2For the sake of simplicity we resolve the domain name

IP24 ASes Cities CC ReplicasAll 1696 346 77 38 13802ge 5 Replicas 897 100 71 36 11598

cap CAIDA-100 19 8 30 18 138cap Alexa-100k 242 15 45 29 4038

Figure 10 Anycast censuses results at a glance

ing IP-anycast here again we find 242 IP24 of 15ASes that are among the major players of the Web

42 Top-100 Anycast ASesAlbeit the amount of anycast IP24 may seem de-

ceiving at first in reason of its exiguous footprint it isnevertheless very rich ndash revealing silver needles in thehaystack From the very coarse cross-check of CAIDAand Alexa ranks we already expect that anycast usageis not only restricted to DNS but rather covers impor-tant ISPs and OTTs Fig 9 presents a birdrsquos-eye viewof anycast adoption depicting several information forthe 100 ASes for which we detected at least 5 replicasidentified by their WHOIS name reported in the x-axis(capped to 12 characters) Geographical and IP24 foot-print are reported in the bottom ASes are arranged leftto right in decreasing number of replicas (bottom bar-plot with standard deviation across IP24 belongingto the same AS) additionally reporting the number ofanycast IP24 for that AS (middle bar-plot) Servicefootprint is correlated to the open TCP ports in the AS(middle scatter-plot) Next the relative importance ofthe AS in the Internet and for the Web are expressedin terms of the CAIDA and Alexa ranks respectively(top scatter-plots) Finally a label reported on the topx-axis categorize the main activity of the ASes from abusiness perspective (category is informal and in case ofASes with multiple services only the most prominent isselected)

Big fishes Major players of the Internet ecosystemare easy to spot in Fig 9 The list includes not onlytier-1 and other ISPs (such as ATampT Services Tinet

of the frontpage found in Alexa to an IP and disregardcontent that is referenced in the frontpage

0

10

20

1 C

LO

UD

FL

AR

EN

ET

2 IS

C-A

SU

S3 H

UR

RIC

AN

EU

S4 C

DN

ET

WO

RK

SU

S-

5 F

AC

EB

OO

KU

S6 C

OM

MU

NIT

YD

NS

7 X

GT

LD

US

8 L

-RO

OT

US

9 M

ICR

OS

OF

TU

S10 I-R

OO

TS

E11 V

ER

ISIG

N-I

NC

12 L

LN

WU

S13 A

RY

AK

A-A

RIN

14 A

PP

LE

-EN

GIN

E15 C

ED

EX

ISU

S16 H

IGH

WIN

DS

3U

17 N

ET

NO

D-I

XS

E18 O

PE

ND

NS

US

19 W

OO

DY

NE

T-1

U20 L

GT

LD

US

21 L

IEC

HT

EN

ST

EI

22 F

AS

TL

YU

S23 C

AC

HE

NE

TW

OR

K24 IN

ST

AR

TU

S25 D

NS

CA

ST

-AS

U26 G

OO

GL

EU

S27 E

DG

EC

AS

T-I

R

28 U

MD

NE

TU

S29 D

YN

DN

SU

S30 N

SO

NE

US

31 E

AS

YL

INK

4U

S32 Y

AH

OO

-AN

2U

S33 U

LT

RA

DN

SU

S34 O

VH

FR

35 L

IEC

HT

EN

ST

EI

36 A

S-A

FIL

IAS

1

37 A

UT

OM

AT

TIC

U38 T

INE

T-B

AC

KB

O39 A

BO

VE

NE

T-C

US

40 A

MA

ZO

N-0

2U

S41 C

WG

B42 L

EV

EL

3U

S43 E

DG

EC

AS

TU

S44 T

WIT

TE

R-N

ET

W45 IN

CA

PS

UL

AU

S46 A

GT

LD

US

47 A

US

RE

GIS

TR

Y-

48 C

EN

TR

AL

NIC

-A49 C

OG

EN

T-2

149

50 H

GT

LD

US

51 H

IGH

WIN

DS

4U

52 K

-RO

OT

-SE

RV

E53 N

ET

RIP

LE

X01

54 O

MN

ITU

RE

US

55 S

OF

TL

AY

ER

US

56 W

AN

GS

U-U

SU

S57 Y

AH

OO

-FC

US

58 B

ITG

RA

VIT

YU

59 A

BIL

EN

EU

S60 A

DV

AN

-CA

ST

U61 A

SA

TTL

DSE

62 A

S-Q

UA

DR

AN

ET

63 A

S6453U

S64 A

TT

EU

65 C

EN

TR

AL

NIC

-A66 C

EN

TU

RY

LIN

K-

67 C

ON

EX

IM-A

S-A

68 E

GT

LD

US

69 K

GT

LD

US

70 M

NS

-AS

NO

71 N

ICA

TA

T72 V

ITA

L-D

NS

US

73 W

HS

-AN

YC

AS

T-

74 Z

GT

LD

US

75 IN

TE

RN

AP

-BL

K76 N

ET

AP

P-A

NY

CA

77 S

PR

INT

LIN

KU

78 A

US

RE

GIS

TR

Y-

79 C

EN

TU

RY

LIN

K-

80 D

NS

IMP

LE

US

81 D

YN

-HC

US

82 E

AS

YL

INK

2U

S83 E

DN

SC

A84 E

SG

OB

-AN

YC

AS

85 H

OM

EP

L-A

SP

L86 L

INK

ED

INU

S87 M

AS

ER

GY

US

88 M

ED

IAM

AT

H-I

N89 M

II-2

GB

90 M

II-X

PC

US

91 P

EE

R1U

S92 P

HH

-AS

DE

93 P

RE

TE

CS

CA

94 P

RO

LE

XIC

US

95 Q

UA

NT

CA

ST

US

96 R

IMB

LA

CK

BE

RR

97 S

UP

ER

NE

TW

OR

K98 U

NO

VA

-1C

A99 V

OX

ILIT

YR

O100 Z

VO

NK

OV

A-A

S

R

ep

lica

s

1

10

100

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

IP

s2

4

5380

443

8080

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00Op

en

po

rts

100

1k

10k

100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00A

lexa

ra

nk

110

1001k

10k100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

CD

ND

NS

ISP

CD

NS

ocia

l N

etw

ork

DN

SD

NS

DN

SC

lou

dD

NS

DN

SC

DN

Clo

ud

CD

NS

ecu

rity

CD

ND

NS

Secu

rity

DN

SD

NS

DN

Su

nkn

ow

nC

DN

CD

NC

DN

DN

SC

lou

dD

NS

CD

Nu

nkn

ow

nD

NS

DN

SC

lou

d m

essag

ing

Web

Po

rtal

DN

SC

lou

du

nkn

ow

nD

NS

Blo

gg

ing

ISP

-tie

r1IS

PC

lou

dIS

PIS

P-t

ier1

CD

NS

ocia

l N

etw

ork

CD

ND

NS

DN

SD

NS

ISP

DN

SC

DN

DN

SD

NS

On

lin

e M

ark

eti

ng

Clo

ud

CD

NW

eb

Po

rtal

CD

NB

ackb

on

e N

etw

ork

un

kn

ow

nD

NS

Clo

ud

ISP

-tie

r1IS

PD

NS

ISP

-tie

r1C

lou

dD

NS

DN

SV

ideo

Co

nfe

ren

cin

gD

NS

DN

SS

ecu

rity

DN

SC

lou

dW

eb

An

aly

tics

ISP

-tie

r1D

NS

ISP

DN

SD

NS

Clo

ud

messag

ing

DN

SD

NS

Clo

ud

So

cia

l N

etw

ork

Clo

ud

AD

tech

no

log

yC

DN

CD

NC

lou

dC

DN

CD

NS

ecu

rity

Web

An

aly

tics

Tele

co

m V

en

do

rC

lou

dD

NS

Clo

ud

un

kn

ow

n

Ca

ida

ra

nk

Figure 9 Birdrsquos eye view of Top-100 anycast ASes (ranked according to geographical footprint)

Sprint TATA Communications Qwest Level 3 Hur-ricane Electrics) but also a rather large spectrum ofOTTs such as CDNs (eg CloudFlare EdgeCast) host-ing (eg OVH) and cloud providers (eg MicrosoftAmazon Web Services) social networks (eg TwitterFacebook LinkedIn) and security companies that pro-vide mitigation services against DDoS attacks (eg Pro-lexic OpenDNS) The list also includes manufactur-ers (eg Apple RIM) Web registrars (eg Verisignnicat) virtual roaming and virtual meeting services(Media Network Services) blogging platforms (Automat-tic a publishing company hosting wordPresscom) cloudmessaging (EASYLINK2 owned by ATampT Services)and web analytics (OMNITURE owned by Adobe Sys-tems) Of course DNS-related service providers suchas root and top-level domain servers (eg ISCF-rootCommunityDNS) DNS service management (eg Ul-traDNS DynDNS) and public DNS resolvers (eg GoogleDNS OpenDNS) also emerge in the censusDiversity We report a breakdown of AS classes inFig 11 crisply showing that DNS now represents aboutone third of IP anycast activities Plots in Fig 9 clearly

illustrate the large diversity of anycast usage under allmetrics Indeed no correlation appear between any twometrics illustrating the degree of freedom in anycast de-ployments for instance the geographical footprint andIP24 footprints are largely unrelated (Pearson corre-lation coefficient of 035) Additionally the number ofopen ports and the specific port values vary not onlyacross deployments having an heterogeneous businessmodel (eg we observe from a minimum of 1 openport for DNS to O(104) open ports for OVH) but alsobetween deployments of the same kind (eg CloudFlareand EdgeCast CDNs have in common only port 53 80and 443 over the set of 22 open ports they are usingwith CloudFlare using 4times more ports than EdgeCast)

Geographical footprint We specifically study themean number of geographical replicas per AS (bottomplot in Fig 9) championed by the CDN CloudFlare inour measurement Overall we observe that 25 ASeshave at least 10 replicas distributed around the globeNotice that these orders of magnitude are significantlysmaller with respect to L7-anycast deployments that

0

5

10

15

20

25

30

35

DNSCDN

CloudISP

UnknownSecurity

SocialOther

AS

bre

akd

ow

n [

]

Figure 11 Breakdown of AS category (only firstcategory is considered)

can exceed O(103) for the large providers which is inpart due to the low number of vantage points in Plan-etLab (see Sec32) Among those ASes we observe10 DNS service providers (including ISC DNScast andDynDNS) and 7 major CDNs (eg CloudeFlare Lime-light Highwinds Fastly CacheNetworks Instart LogicCDNetworks) We also discover two cloud providers(eg Microsoft and Aryaka Networks) one tier-1 ISP(Hurricane Electric which has 15 of ASes in its cus-tomer cone according to CAIDA) a security company(OpenDNS also popular for its public DNS service) asocial network (Facebook) and a manufacturer (Apple)

Fig 12 further reports the cumulative number of repli-cas per IP24 depicting both results coming from thecombination of censuses as well as individual resultfrom each census alone Specifically the MIS solverorders circles by increasing radius size intuitively thesmaller the latency the lower the number of overlapsthe better the recall of our method This is confirmedin Fig 12 where censuses are combined by computingthe minimum among multiple latency measurementsbetween the same VP and target pair to get an es-timate of the RTT latency that is as close as possibleto the propagation delay Additionally combining mea-surement increases recall about 200 more IP24 arefound to be anycast in the combination with respect tothe average individual census

In this paper we limitedly consider results from thecombination but remark that results are quite con-sistent across censuses (notice that curves overlap inFig 12) A last comment is worth making about de-ployments where we observe only 2 geographically dis-tributed replicas ndash which is possibly due to the low den-sity of our VPs but could also be tied to the wronggeolocation of some VP raising false positive replicasWhile we have anecdotal evidence of some of these ex-iguous deployments being anycast we prefer to defer amore detailed analysis for future work (see Sec 5)

0 01 02 03 04 05 06 07 08 09

1

2 5 10 15 20 25

CD

F o

f n

um

ber

of

IPs2

4

Number of replicas

Combination of censuses-308VPs(130315) Census 4-240VPs(040315) Census 3-269VPs(020315) Census 2-255VPs(270215) Census 1-261VPs

Figure 12 CDF of geographically distinct repli-cas per IP24 (individual censuses and overall)

0 01 02 03 04 05 06 07 08 09

1

1 10 100 1000

CLOUDFLARENETUS (328)

GOOGLEUS (102)

EDGECASTUS (37)

PROLEXICUS (21)

LINKEDINUS (1)

APPLEUS (6)

TWITTERUS (3)

LEVEL3US (2)C

DF

of

nu

mb

er

of

AS

es

Number of IPs24

gt= 5 Replicas

Figure 13 Number of IPs24 per AS

IP24 footprint In terms of the number of anycastIPs24 per AS (middle plot in Fig 9) we find thatthe CDN CloudFlare is by far the largest in terms ofIP address ranges Overall we find 10 ASes that haveat least 10 anycast IPs24 3 are CDNs (CloudFlareEdgeCast BitGravity) 3 are DNS providers (DNScastWoodyNet UltraDNS) and the remaining ASes rep-resent multiple services (Automattic Google AmazonWeb Services and Prolexic) The distribution of thenumber of IPs24 per AS depicted in Fig 13 shows thatabout half have exactly one IP24 (eg LinkedIn andATampT Services) Yet about 10 of the ASes employat least 10 subnets for instance Prolexic EdgeCastGoogle and CloudFlare employ 21 37 102 and 328anycast IP24 respectively

While in this work we do not provide a systematicinvestigation of the deployment density (ie how manyIP32 are alive in each IP24) from the above discus-sion about diversity is not surprising that we were ableto identify both very sparse (eg Google 8888 is theonly address alive in the 888024) and very dense de-ployments (eg well over 99 of IPs are alive in mostCloudFlare subnets)

Importance The presence of ASes ranking amongthe top-100 in the CAIDA list as well as CDNs servingcontent in the top-100k Alexa list are good indicatorsthat anycast is used for popular and important servicesConsidering CDNs that are after DNS the most popu-lar anycast service according to Fig 11 we observe that8 CDNs serve Alexa-100k websites this set includesCloudFlare EdgeCast and Fastly with 188 10 and5 websites respectively (in addition Highwinds Cach-eNetworks Instart Incapsula and BitGravity host onepopular site each) In addition 11 of the websites listedby Alexa are hosted by Google anycast IPs Finally10 websites are hosted on IPs that belong to Prolexic(now part of Akamai) which operates a DDoS mitiga-tion service that receives the traffic on behalf of its clientnetworks redirecting only legitimate traffic to them

43 Anycast ServicesPortscan campaign In reason of the historical use ofanycast for DNS services we believe it to be importantto provide an up-to-date longitudinal view across ser-vices offered via IP-anycast especially focusing on TCPWe provide a summary of the nmap probing in the topof Fig 14 We test all anycast 24 of the top-100 ASespicking a single IP representative per 24 we scan atlow rates all 216 TCP ports Our results are conser-vative in that different IPs in the same 24 may havedifferent open ports (which happens eg for Cloud-Flare and EdgeCast) and since an under-estimation ofthe number of open TCP ports can also be the result ofprobe filtering by firewalls and routers along the pathto the targets Out of the 897 IP of the top-100 ASeswe find that 816 of 81 ASes have at least one open TCPport The total number of distinct open TCP portsacross is 10485 providing 449 well-known services (ieas indicated by TCP port classification) 170 of whichover SSL Additionally nmap fingerprinting discovers30 different software implementations running on theanycast replicas that we also detail next

Class imbalance Given the heterogeneity of the IP24footprint we argue being necessary to consider only per-AS statistics to avoid presenting results that are biaseddue to class imbalance We illustrate the problem bydepicting in Fig 14 the frequency count of the top-10open TCP ports by number of ASes (top) and IPs24(bottom) Notice that only port 80 443 and 53 appearto be common to both top and bottom plots especiallyall ports in the hatched area are due to the large pre-dominance of IP24 owned by the CloudFlare AS whichalso affect the order of common ports in the top-10 Wethus focus on per AS statistics in the following

Stateful services Fig 15 presents the complemen-tary CDF of the number of open TCP ports per AS

nmap portscan statisticsIPs32 ASes Ports (SSL) Well Software

known812 81 10499 (185) 457 30

0

200

400

80

443

8080

53

2052

2053

2082

2083

8443

2087IP

24 f

req

uen

cy

IP2

4 f

req

uen

cy

0

200

400

htt

p

htt

p s

sl

htt

p

do

main

htt

p

htt

p

htt

p

htt

p

htt

p

htt

p

0

20

40

60

53

80

443

179

22

8080

8083

3306

1935

5252A

S f

req

uen

cy

AS

fre

qu

en

cy

0

20

40

60

do

main

htt

p

htt

p-s

sl

tcp

wra

pp

ed

ssh

htt

p

tcp

wra

pp

ed

tcp

wra

pp

ed

sip

mo

vaz-s

sc

Figure 14 Overall nmap portscan statistics andTop-10 open TCP ports (per AS and per 24)

We make the following observations (i) roughly half ofthe ASes have at least one open TCP port (ii) about10 of the ASes have at least 5 open TCP ports and(iii) the largest service footprint is represented by Incap-sula and especially OVH with 313 and 10148 open portsrespectively In the latter case while we did not inves-tigate thoroughly we suspect the large number of portsbeing due to the fact that OVH the largest hosting ser-vice in Europe and the 3rd in the world is significantlypopular in the BitTorrent seedbox ecosystem [42] Pre-dominant services (beyond DNS) include fairly popu-lar HTTP and HTTPS used by over 20 of the ASesEven excluding the OVH case the list of interestingservices is large In terms of business diversity 22 ASeshave at least 4 different TCP ports open 8 CDNs 4DNS 4 ISPs including a tier-1 ISP (Tinet SpA) andGoogle with 9 open TCP ports Finally interesting(though unpopular) services worth listing include multi-media services (RTMP Simplify Media MythTV) andgaming (Minecraft)

Software diversity Fig 16 lists 30 different soft-ware that we group into three main categories WebMail and DNS Interestingly the list includes opensource software such as popular web and DNS daemons(eg nginx ISC BIND) and proprietary software (egECAccECSECD which are web servers developed byEdgeCast) Starting with DNS software notice that for44 ASes using port 53 (out of 67) nmap could not iden-tify the software version running on the remote serverUnsurprisingly we find that ISC BIND is by far themost adopted protocol to handle DNS requests overanycast Yet we also detect the use by 3 ASes (AppleK-ROOT L-ROOT) of the NLnet Labs NSD implemen-

001

01

1

1 10 100 1000 10000 100000

OVHFR (10148)

INCAPSULAUS (330)

CLOUDFLARENETUS (20)

GOOGLEUS (9)

EDGECASTUS (5)C

CD

F

Number of open TCP ports

ASes

Figure 15 Complementary CDF of the numberof open TCP ports per AS

tation which is specifically designed to add resilienceagainst software failures of DNS root servers

Among web servers the most popular are nginx (7ASes) Apache httpd and lighttpd (ex aeligquo with 4ASes) We observe the use of proprietary web serversby some CDNs (eg cloudflare-nginx and Panel httpd)Though our dataset has a limited size we attempt acomparison with the relative popularity of webserversin the unicast world the Spearman correlation of popu-larity rank in our dataset with webserver ranks [2] in theAlexa-10M is low (038) As for the DNS case differencemay arise in some peculiar features that are especiallyvaluable in the anycast context Finally we detect thepresence of running daemons that serve mail on anycastIPs from Google (Gmail imapd Gmail pop3d gsmtp)as well as of RPC (ssh MicrosoftRPC) and databases(MySQLMicrosoft SQL)

5 DISCUSSIONWe present the first census(es) of IPv4 anycast de-

ployment gathered through an original and robust tech-nique implemented with an efficient and scalable systemdesign In spirit with the open source and data move-ment results of our census are available at [21]

Our characterization show that a tiny fraction of theIPv4 space is anycasted yet among the anycasters werecognize major players of the Internet ecosystem in-cluding top-ranking ISPs popular Cloud OTT and es-pecially CDN operators We additionally show greatheterogeneity along multiple directions and especiallyin terms of the offered services Particularly our portscancampaign of anycasted subnets reveals over 450 well-known services from over 10 000 unique open portsAdditionally we uncover 30 software implementationswith a relative breakdown that differs from softwareranking in the unicast IP world

Yet this work only scratches the surface and opensmore questions than it is able to answer as for instance

0

10

20

ISC

BIN

DN

Ln

et

Lab

s N

SD

Mic

roso

ft D

NS

Op

en

DN

S

ng

inx

lig

htt

pd

Ap

ach

e h

ttp

dE

CD

Mic

roso

ft H

TT

P

Mic

roso

ft IIS

Varn

ish

Ap

ach

e T

om

cat

bit

asic

v2

CF

S 0

213

clo

ud

flare

-ng

inx

cP

an

el h

ttp

dth

ttp

dE

CA

ccE

CS

Go

og

le h

ttp

din

sta

rt1

60

Gm

ail im

ap

dG

mail p

op

3d

Go

og

le g

sm

tp

Op

en

SS

HM

yS

QL

ssls

trip

Mic

roso

ft S

QL

M

icro

so

ft R

PC

DNS Web Mail Other

AS

fre

qu

en

cy

Figure 16 Breakdown of software running onanycast replicas

IP-level CDN Refining the active methodology bymapping content object (and not only frontpage) fromthe Alexa-100k would be needed to gather a better un-derstanding of IP-level CDNLongitudinal view Taking periodic censuses and an-alyzing the time evolution over longer timescales wouldallow to track evolution of IP anycast deploymentsTraffic volume A missing information concerns thetraffic volume served by IP anycast that can be gath-ered via passive measurement and annotated with re-sults of our census (ie binary flag per anycast IP24)Combine measurement platforms As we have seenit would be interesting to exploit multiple platforms inaddition to PlanetLab such as RIPE Atlas this wouldboth lead to a better characterization of large deploy-ments (eg increase the recall) as well as possibly assistin confirmingdiscarding suspicious deployments (iethose for which we detected 2 replicas from PL)BGP hijacking Detecting geo-inconsistencies for know-ingly unicast prefixes is symptomatic of BGP hijack-ing attacks being able to periodically and quickly scanthe network to raise alarms and cross-check them withother types of data (eg BGP feeds traceroute mea-surements) is a relevant extension of this work

AcknowledgementsWe thank our shepherd Matteo Varvello and the anony-mous CoNEXT reviewers for their valuable feedbackThis work has been carried out at LINCS (httpwwwlincsfr) The research leading to these results was sup-ported by the European Union under the FP7 GrantAgreement n 318627 (Integrated Project rdquomPlanerdquo)and by a Google Faculty Research Award

6 REFERENCES[1] httpinternetcensus2012bitbucketorgpaperhtml

[2] httpw3techscomtechnologiesoverviewweb serverall

[3] J Abley A software approach to distributing requestsfor DNS service using GNU Zebra ISC BIND 9FreeBSD In Proc USENIX ATEC 2004

[4] J Abley and K Lindqvist Operation of AnycastServices RFC 4786 (Best Current Practice) 2006

[5] V K Adhikari S Jain and Z li Zhang Youtubetraffic dynamics and its interplay with a tier-1 isp Anisp perspective In ACM IMC 2010

[6] httpwwwcaidaorgprojectsark

[7] httpwwwroot-serversorg

[8] F Baker Requirements for IP Version 4 RoutersIETF RFC 1812 1995

[9] H Ballani and P Francis Towards a global IP anycastservice In Proc ACM SIGCOMM 2005

[10] H Ballani P Francis and S Ratnasamy Ameasurement-based deployment proposal for IPanycast In ACM IMC 2006

[11] B Barber M Larson and M Kosters Traffic sourceanalysis of the J root anycast instances Nanog 2006

[12] I Bermudez S Traverso M Mellia and M MunafoExploring the cloud from passive measurements TheAmazon AWS case In Proc IEEE INFOCOM 2013

[13] P Boothe and R Bush DNS Anycast Stability SomeEarly Results CAIDA 2005

[14] R Braden Requirements for Internet Hosts -Communication Layers IETF RFC 1122 1989

[15] M Calder X Fan Z Hu E Katz-BassettJ Heidemann and R Govindan Mapping theexpansion of googlersquos serving infrastructure In ACMIMC 2013

[16] M Calder A Flavel E Katz-Bassett R Mahajanand J Padhye Analyzing the performance of ananycast cdn In ACM IMC 2015

[17] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman A fistful of pings Accurateand lightweight anycast enumeration and geolocationIn Proc IEEE INFOCOM 2015

[18] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman Latency-based anycastgeolocalization Algorithms software and datasets InTech Rep 2015

[19] L Colitti Measuring anycast server performance Thecase of K-root Nanog 2006

[20] C Contavalli W van der Gaast S Leach andE Lewis Client Subnet in DNS Queries httpstools

ietforghtmldraft-ietf-dnsop-edns-client-subnet-04

[21] httpwwwenstfr˜drossianycast

[22] Z Durumeric E Wustrow and J A HaldermanZmap Fast internet-wide scanning and its securityapplications In USENIX Security Symposium 2013

[23] B Eriksson P Barford J Sommers and R NowakA learning-based approach for IP geolocation InPAM 2010

[24] B Eriksson and M Crovella Understandinggeolocation accuracy using network geometry In ProcIEEE INFOCOM 2013

[25] X Fan J S Heidemann and R GovindanEvaluating anycast in the domain name system InProc IEEE INFOCOM 2013

[26] httpwwwict-mplaneeupublicfastping

[27] A Flavel P Mani D A Maltz N Holt J LiuY Chen and O Surmachev Fastroute A scalableload-aware anycast routing architecture for moderncdns In Proc USENIX NSDI 2015

[28] B Gueye A Ziviani M Crovella and S FdidaConstraint-based geolocation of internet hosts InACM IMC 2004

[29] T Hardie Distributing Authoritative Name Serversvia Shared Unicast Addresses IETF RFC 3258 2002

[30] J Heidemann Y Pradkin R GovindanC Papadopoulos G Bartlett and J BannisterCensus and survey of the visible internet In ACMIMC 2008

[31] Internet addresses hitlist dataset (20140829rev4338)Provided by the USCLANDER project(httpwwwisieduantlander)

[32] D Karrenberg Anycast and BGP stability A closerlook at DNSMON data Nanog 2005

[33] T Krenc O Hohlfeld and A Feldmann An internetcensus taken by an illegal botnet A qualitativeassessment of published measurements ACM CCR44(3) 2014

[34] Z Liu B Huffaker M Fomenkov N Brownlee andK C Claffy Two days in the life of the DNS anycastroot servers In PAM 2007

[35] D Madory C Cook and K Miao Who are theanycasters Nanog 2013

[36] B M Maggs and R K Sitaraman Algorithmicnuggets in content delivery SIGCOMM ComputCommun Rev 45(3) Jul 2015

[37] K Miller Deploying IP anycast Nanog 2003

[38] httpsnmaporg

[39] httpswwwopendnscomdata-center-locations

[40] C Pelsser L Cittadini S Vissicchio and R BushFrom paris to tokyo On the suitability of ping tomeasure latency In ACM IMC 2013

[41] I Poese S Uhlig M A Kaafar B Donnet andB Gueye IP geolocation databases Unreliable ACMCCR 41(2) 2011

[42] D Rossi G Pujol X Wang and F Mathieu Peekingthrough the BitTorrent seedbox hosting ecosystem InProc Traffic Monitoring and Analysis (TMA) 2014

[43] S Sarat V Pappas and A Terzis On the use ofanycast in DNS In Proc ACM SIGMETRICS 2005

[44] Y Shavitt and N Zilberman A geolocation databasesstudy IEEE J-SAC 29(10) 2011

[45] F Streibelt J Bottger N Chatzis G Smaragdakisand A Feldmann Exploring edns-client-subnetadopters in your free time In ACM IMC 2013

[46] A Su D R Choffnes A Kuzmanovic andF Bustamante Drafting behind akamai(travelocity-based detouring) In Proc ACMSIGCOMM 2006

[47] R Torres A Finamore J R Kim M Mellia M MMunafo and S Rao Dissecting video server selectionstrategies in the YouTube CDN In Proc of IEEEICDCS 2011

[48] S Zander L L Andrew and G Armitage Capturingghosts Predicting the used ipv4 space by inferringunobserved addresses In ACM IMC 2014

  • Introduction
  • Methodology Overview
    • Workflow
    • State of the art
      • System design
        • Census targets
        • Measurement dataset vs platform
        • Measurement software
        • Network protocol
        • Scalability
          • Anycast 0 censuses
            • At a glance
            • Top-100 Anycast ASes
            • Anycast Services
              • Discussion
              • References
Page 3: Characterizing IPv4 Anycast Adoption and Deployment€¦ · Characterizing IPv4 Anycast Adoption and Deployment ... CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM

(a) Measure-ment Map RTTsamples to diskscentered aroundVPs

(b) Detect Non-overlapping disksimply speed-of-light violation

(c) Enumerate Solving a Maxi-mum Independent Set (MIS) prob-lem yields non-overlapping disk eachcontaining a different replica (twosteps shown)

(d) GeolocateMaximum likeli-hood classificationproblem (city-level)

(e) Iterate Col-lapse disks aroundgeolocated repli-cas until conver-gence

Figure 3 Illustration of the Analysis technique enumeration and geolocation steps)

as is the case for the green discs in Fig 3(b) While ob-servation of inconsistency among any pair lead to any-cast detection leveraging multiple observations it is pos-sible to further enumerate such replicas Enumerationis described in step (c) to provide a conservative esti-mation of the minimum number of anycast replicas wesolve a Maximum Independent Set (MIS) problem MISoutputs a set of non-overlapping disks which contain adifferent replica of the same target while MIS prob-lem is NP-Hard we solve it using a 5-approximationalgorithm that greedily operates on disks of increasingradius size as in Fig 3(c) and that in practice yieldresults that are very close to the optimum providedby a prohibitively more costly brute force solution [17]Geolocation happens in step (d) in the smallest diskwe geolocate the replica at city-level granularity with amaximum likelihood estimator biased toward city pop-ulation actually we find that the city population hassufficient discriminative power alone (about 75 accu-racy [18]) so that our geolocation criterion boils downinto picking the largest city in that disk Finally (e)we coalesce the disk to the classified city which reducesdisk overlap and allows iteration of the algorithm untilconvergence thus increasing the recall (ie number ofreplicas discovered) along each iteration

The analysis technique [17] is of course not an originalcontribution of this work whose main aims are insteadto scale up its application to an Internet-wide census onthe one hand (Sec 3) and to analyse and publish thegathered dataset on the other hand (Sec 4)

Characterization and Validation In addition toanycast detection the previous steps allow to geolocatethe replicas behind each anycast IP24 As outlined inFig 1 we validate the output of the geolocation stepwhenever a ground truth is available (as in Sec 34 forCDNs such as CloudFlare and Edgecast complemen-tary to the validation limited to DNS in [17])

Finally we provide a fine-grained characterization ofIP24 anycast services While our detection method-ology is service-agnostic we use nmap [38] on a list of

Figure 4 Anycast census at a glance typicalcensus magnitude

anycast IPs obtained from the census to reveal openports and the software their run Given that an exhaus-tive portscan (ie of the 216 TCP and UDP portspacesover all replicas of all anycast deployments) still incursa prohibitive measurement cost we restrict the mea-surements to TCP services (ie the most unexpectedones) and interesting deployments (ie deploymentswith large geographical footprints) We discuss any-cast services in Sec 43 finding over 10000 open portsthat map to about 500 well-known services and finger-printing some 30 software applications

Scale Completeness and Accuracy Although wedescribed the anycast geolocation technique in [17] wehad to overcome several challenges to run it at Internet-scale and within a short timespan for which we wentthrough multiple re-engineering phases We believe thata number of lesson learned (eg as the counter-intuitiveneed to slow-down the sending rate to complete a cen-sus) are worth sharing and discuss them in Sec 3

Notice that a large number of vantage points is re-quired to provide an accurate picture of anycast de-ployment especially in terms of the number of repli-cas discovered around the world Related work that fo-cuses on O(1) targets (ie DNS root-servers) indeed runmeasurement campaigns involving from O(104) [10] toO(105) [25] vantage points to achieve asymp90 recall [25]In our case given the sheer size O(107) of our target setwe tradeoff completeness for scale and possibly under-

estimate the number of IP-anycast replicas as we use amere O(102) vantage points

Still our results provide a broad conservative yetaccurate picture of Internet anycast usage for targetsfor which we have the ground truth our city-level ge-olocation is accurate in about 75 of the cases with amedian error of 350 Km otherwise (Sec 34)

Typical census In this work we perform four IPv4censuses and analyse the results obtained from theircombination For each of the O(102) VPs the magnitudeof a typical census is illustrated in Fig 4 starting froma hitlist of O(107) targets less than half send a reply(Sec 31) ICMP replies include O(105) errors some ofwhich relates to administratively prohibited communi-cation senders of these ICMP error messages are addedto a greylist to avoid probing them again in future cen-suses (Sec 33) Finally running the anycast geoloca-tion technique over the O(106) targets that generatevalid ICMP echo reply messages we discover roughlyO(103) IP24 anycast deployments corresponding toapproximately 01h of the whole IPv4 address space ndashthe proverbial needle in the IPv4 haystack

22 State of the artAnycast vs Unicast In standard IP unicast censusesthe set of targets can be split among VPs for scalabilityreasons In contrast in the anycast case all targetsshould be probed by all VPs to provide an accuratemap of geographical footprints Given that the numberof active VPs in PL is around 300 and that only oneIP32 target for each IP24 subnet needs to be probedin a given anycast census it follows that the raw amountof probe traffic is only slightly larger than that of anunicast censuses

In the unicast case the relative location of VPs withrespect to targets is irrelevant Therefore census strate-gies cover extremes such as a single centralized high-endserver capable of O(106) probes-per-second on a wellprovisioned 1 Gbps Ethernet connection as in Zmap [22]or a highly decentralized system that exploits O(105)low-resource gateways as in the illegal Carna Botnetcensus [1] Our system design sits halfway betweenthese two extremes in particular it uses an efficientapplication-level multi-threaded scanner capable of O(104)probessec distributed over O(102) PlanetLab nodes

It is also worth pointing out that an independentanalysis [33] of the Carna Botnet dataset found multiplecampaigns covering a cumulative number of probes ex-ceeding a full IPv4 census over a duration of 8 monthsAdditionally [33] suggests that due to an overlap in thetarget set not all hosts were probed neither during thetwo fast scan campaign identified nor during the wholemeasurement period As such our measurement cam-paign comprising multiple anycast censuses each lasting

only few hours constitutes an achievement per se

Geolocation and infrastructure mapping Uni-cast geolocation is a well investigated research topicNumerous techniques based on latency measurements[23 24 28] and databases [41 44] have been proposedYet database techniques are not only unreliable withunicast [41] but also with anycast since they adver-tise a single geolocation per IP Similarly latency-basedtechniques [23 24 28] use triangulation and geolocateunicast addresses at the intersection of multiple latencymeasurements from geographically dispersed vantage pointsHowever this assumption no longer necessarily holds foranycast as depicted in Fig 3

L7-anycast infrastructure mapping studies [1545] lever-age ECS requests to geolocate servers (millions of) re-quests are sent with different client IPs from one VP tounveil (thousands of) unicast IP addresses correspond-ing to PoPs of major OTTs However ECS supportis becoming widespread to enhance the user online ex-perience but is not yet pervasive Finally the tech-nique fails with alternative L7-anycast design relyingon HTTP redirection

To the best of our knowledge no IP-anycast geolo-cation technique exists other than our own as suchno other study apart from this work deals with any-cast infrastucture mapping With respect to ECS-basedtechnique for L7 anycast our technique allows to re-duce the cardinality of the problem without sacrificinggeolocation accuracy (a qualitative comparison is pro-vided in [17]) Additionally since BGP provides a uni-fied redirection technique IP-anycast offers an unprece-dented opportunity to broadly assess all deployments atonce

Anycast discovery and characterization Priorwork investigates different aspects related to anycastwith a focus on discovery and enumeration [25 35] oron characterization [9ndash1119323443] but to the bestof our knowledge not on anycast census Closest toours [17] is the work of [25 35] Specifically [17] isa service-agnostic technique for detection enumerationand geolocation Similarly to [17] speed of light vio-lation is used in [35] (however limited to detection andnot capable of enumerationgeolocation) Conversely[25] exploits DNS-specificities (ie CHAOS requests)to enumerate DNS replicas (but unlike [17] is neithercapable of geolocation nor applicable beyond DNS)

Other studies assess the performance of current IPanycast deployments with a focus on metrics such asproximity [910193443] affinity [9ndash11133443] avail-ability [103243] and load-balancing [1011] Yet thesestudies focus on DNS which is just a piece of the cur-rent anycast puzzle

Finally some work study IP-anycast CDN such as [16

27] However these focused studies add yet other usefulpieces to the puzzle that remained so far incompletelacking the broad perspective given by a Internet-widecoverage over all prefixes and services

3 SYSTEM DESIGNAnycast detection relies on measuring round trip de-

lays between a set of vantage points and a target IP ad-dress to uncover geo-inconsistencies Running an Inter-net census thus requires measurements towards millionsof destinations ideally in a short timeframe we nowdescribe and justify system design choices that allow usto perform multiple censuses that we analyze later inSec 4 Items discussed in this section concern the se-lection of targets (Sec 31) the measurement platform(Sec 32) and software (Sec 33) as well as the networkprotocol used (Sec 34) Finally we report considera-tions about the scalability of our workflow (Sec 35)

31 Census targetsCensus granularity Unlike multicast anycast ad-dresses need no reservation into the IP space as any IPaddress can be a candidate this makes deployment easybut the detection of anycast addresses hard Luckily toavoid a significant increase in the size of routing tablesBGP standard practice [4] is to ignore or block prefixesshorter that 24 Thus 24 is the minimum granular-ity for anycasted services which is a good granularityfor our census We validate this assumption with (spot)verifications for all IP addresses on some IP24 (belong-ing to EdgeCast) confirming any IP in the 24 to beequivalent for anycast detection purposes Additionallya 24 granularity implies that announced BGP prefixesthat are smaller than 24 are tested multiple times oneper each 24 they contain the mapping between 24and announced prefixes is still possible a posteriori aswe do in this work This choice is reinforced by [35]which found 88 of announced prefixes to be 24 statedthat ldquoanycast prefixes are dominated by 24rdquo and sug-gested that larger prefixes may be anycast only in partdue to BGP prefix aggregation We therefore fix thecensus granularity to a single target IP per 24

Target liveness As previously argued any alive IPbelonging to a 24 is equivalent in telling whether thewhole 24 is anycast (or unicast) To identify a respon-sive IP address in every 24-prefix we rely on the hitlistperiodically published by [31] The hitlist consists ingenerally one representative IP address for O(107) pre-fixes along with a score indicative of the host livelinesscomputed over several measurement campaigns Whenno alive IP has been observed in a 24 the hitlist con-tains an arbitrary address from that 24 (score leminus2)After covering the full hitlist with the first census weconfirm these hosts not being reachable and remove

Figure 5 Microsoft deployment as seen fromPlanetLab (21 replicas) vs RIPE (54 replicas)Notice that PlanetLab results (white markers)are a subset of RIPE (white and black markers)

them to reduce the target size to 66 middot 106 per VP

Coverage Given our census aim we verify how wellthis hitlist covers all routed 24 prefixes We thereforeobtain from CAIDA a dump of routing tables originat-ing from both RIPE RIS and RouteViews collectorsTo compare the hitlist vs the advertised prefixes wesplit the latter in 24 obtaining 10616435 24 pre-fixes of which 10615563 have a representative in thehitlist (over 9999 coverage) We additionally cross-check our observed target responsiveness with the ex-pected recall specifically recent ICMP scans [48] ob-serve 49 middot106 used 24 subnets and our campaigns simi-larly capture 44 middot106 responsive subnets (90 coveragewith respect to [48])

32 Measurement dataset vs platformDataset One option to avoid running a large scalemeasurement campaign is to exploit readily availabledatasets from public measurement infrastructures ndash yetwe could not find any fitting our purpose For instancedespite probing all 24 every 2-3 days Archipelago [6]clusters its vantage points into three independent groupseach using random IPs selected in each 24 prefix itfollows that at most 3 monitors target each 24 withgenerally different IP addresses and a hit rate of about6 Given the low hit-rate and low-parallelism suchdataset is not appropriate for our purpose as it wouldnot lead to a complete census nor to an accurate geolo-cation footprint even in case of hits

Platforms There are a number of available measure-ment platforms in the community each with its ownadvantages and limitations Except for illustration pur-poses in this paper we relied on PlanetLab (PL) WhileRIPE Atlas (RIPE for short) is more interesting for ge-ographical diversity due to its scale it has a limitedcontrol on the rate and type (cf Sec 34) of measure-

ments as well as their instantiation for such a largescale campaign (ie upload of the hitlist probing bud-get) Additionally the larger number of vantage pointswould mechanically increase about 20-fold the amountof probes per census with respect to PL (in case allVPs are used) Conversely measurement in PL are lim-ited by node availability (generally around 300 vantagepoints) but offer full flexibility for deploying customsoftware and run it at high speed (cf Sec 33)

While in this work we limitedly use PL we depictfor illustration purposes an application of our techniquefrom measurements collected from PlanetLab vs RIPEin Fig 5 as PL results are a subset of RIPE resultswhite markers indicate replicas found from both plat-forms while black markers pinpoint replicas that areonly found with RIPE measurements While this ex-ample has anecdotal relevance it suggests that an in-triguing direction is to combine both platforms eg byrefining via RIPE the geolocation of anycast 24 de-tected via PL

33 Measurement softwareFastping An efficient measurement tool is needed tomaximize the probing capacity of our VPs While atfirst sight Zmap [22] could seem the perfect tool forsuch large-scale campaign it however exhibits a majorblocking point in our setup namely Zmap generatesraw Ethernet frames which are very efficient in a localsetup but are not supported by the PlanetLab virtu-alization layer We therefore resort to Fastping [26] atool specialized as the name implies in ICMP scan-ning which is deployed on each PL node Fastping isable to send about O(104) probes per second ndash abouttwo orders of magnitude slower than Zmap but fasterthan the fastest nmap scripting engine scanner As wewill point out later concerning scalability (Sec 35) inorder to gather complete censuses in few hours we hadto undergo several rounds of re-engineering ndash includingpurposely slowing down Fastping sending-rate

Greylist Additionally Fastping adopts the usual tech-niques to be a good Internet measurement citizen ndash iea signature in the payload points to its homepage Fast-ping probes the target list in a randomized order toreduce intrusiveness and implements a greylist mecha-nism to honor requests to stop probing administrativelyprohibited hostsnetworks inferred from ICMP returncodes Before running a census from O(102) VP we ini-tially run a census from a single VP in order to buildan initial blacklist During any census we then collectaddresses generating ICMP return codes (other thanecho reply) in a temporary greylist that we later in-crementally merge with the the blacklist This list hasapproximately O(105) hosts with 985 added due toadministrative filtering [8] (type 3 code 13) and the re-

0

20

40

60

80

100

ICMP TCP-53 TCP-80 DNSUDP DNSTCP

L3 L4 L7

Resp

on

se r

ati

o [

]

OpenDNS Edgecast Cloudflare Microsoft

Figure 6 Response rates seen by heterogeneousprotocols across different targets

0

025

05

075

1

GTPAI TPR Error[Km]0

250

500

750

1000

GT

PA

I an

d T

PR

Med

ian

err

or

[Km

]CloudFlare EdgeCast

Figure 7 Validation with CloudFlare and Edge-Cast ASes Bars represent standard deviationamong IP24 of the AS

maining in reason of communications administrativelyprohibited at network or host levels (respectively 13code 10 [14] and 02 code 9 [14])

34 Network protocolRecall ICMP has often been used (and misused) inmeasurement studies especially given recent work show-ing that ICMP latency measurements are often not re-liable [40] we thus need to confirm the validity of ourprotocol selection A major motivation for ICMP mea-surement is given by the high recall it offers [48] Con-sider indeed that TCP and UDP measurements wouldneed an a priori knowledge (or guess) of services run-ning on the target under test We therefore perform atest on a reduced set of targets performing 100 mea-surements with different protocols specifically we con-sider network L3 (ICMP) and transport L4 (TCP SYN-SYNACK pair in the three-way handshake to port53 or 80) measurements as well as L7 (DNSUDP vsDNSTCP using dig) measurements Fig 6 shows thatprotocols other than ICMP have a binary recall inother words they work well only if the service is knowna priori Conversely ICMP is the only reliable alterna-tive yielding high recall across all deployments and isthus well suited for censuses

Accuracy While our technique relies on latency mea-surements it leverages the discriminative power of sidechannel information (ie cities population within disks)to cope with latency measurement noise While we vali-date the accuracy of the methodology for DNS in [17] avalidation for stateful TCP connections is still missingTo do so we build a ground truth (GT) for CloudFlareand EdgeCast by performing HTTP measurements withcurl from PL note that HTTP measurements are notavailable from RIPE highlighting once more the com-plementarity of these platforms

By inspection of the HTTP headers we find thatCloudFlare (EdgeCast) encode geolocation of the replicain the custom CF-RAY (standard Server) header fieldNotice that the measured GT constitutes the upper-bound of what can be possibly achieved from PL mea-surements while the publicly available information (PAI)displayed on the CloudFlare and EdgeCast websites con-tains a super-set of locations with respect to those mea-sured from PL We contrast true positive (TPR) clas-sification of our census vs HTTP GT in Fig 7 in 77of the IP24 for CloudFlare (65 for EdgeCast) thereis agreement at city level with a median error of 434Km (287 Km for EdgeCast) in the (relatively few) mis-classification cases As expected the low number of PLnodes possibly limits the portion of discoverable repli-cas (GTPAI is fairly high for CloufFlare but fairly lowfor EdgeCast) making our footprint estimates conser-vative and confirming the interest for alternative plat-forms such as RIPE

Consistency Additionally in the case of openDNSwe verify consistency across multiple RTT latency mea-surement techniques used early in Fig 6 In this case werely on public information that maps 24 locations [39]For all protocols applying [17] on the dataset yieldsbetween 15 and 17 instances Notice that all cities re-turned by the analysis are correct except Philadelphia(while the server is located in Ashburn at 260km or26ms worth of propagation delay away) this misclas-sification is due to the bias enforced in [17] toward citypopulation (Philadelphia is 33 times more populatedthan Ashburn) but as observed in [15] this is not prob-lematic as the ldquophysicalrdquo Ashburn location is actuallyserving the ldquologicalrdquo Philadelphia population

35 ScalabilityProbing rate When designing census experimentswe take care of avoiding obvious pitfalls For instancewhile we target a single host per 24 nevertheless weperform measurements from all PL nodes It followsthat each node must desynchronize to avoid hitting ICMPrate limiting (or raising alert) at the destination We doso by randomized permutation for target nodes achieved

Table 1 Textual (0) vs binary (1-4) censusesCensus ID Format Size (hosttotal) Analysis0 csv (270M 79G) gt3 days1-4 binary (21M 6G) 3 hr

0

02

04

06

08

1

1 2 4 8 16

CD

F

Completion time per VP [Hr]

Figure 8 CDF of per-vantage point completiontime over all censuses

via a Linear Feedback Shift Register (LFSR) with Ga-lois configuration Still while the LFSR solves rate lim-iting at the target it does not solve problems at thesource (or in the network) indeed while requests arewell spread replies do aggregate close to the VP thatreceives an aggregate rate equal to the probing rate ofFastping (in excess of 10000 hosts per second) In ourpreliminary (and incomplete) censuses we noted het-erogeneous (and possibly very high) drop rates for someVPs (likely tied to rate limiting spatially close to theVP) Given that the networks where PL machines arehosted are independently administered we opted for asimple solution and slowed down Fastping by one orderof magnitude1 that we verified empirically not trigger-ing the above problems Consequently probing 66 middot106

targets at 103 targets per second takes less than twohours as shown in Fig8 about 40 of PL nodes com-plete within this timeframe and 95 in under 5 hours(longer duration likely due to load on the PL host)

Output size and analysis duration A second scal-ability issue concerns the output format We initiallyoverlooked this issue and logged in textual format awealth of information amounting to 270M per node and80GB overall per census (cf Tab1) We therefore optedfor a radical reduction of the output size dumping astripped-down binary format containing a timestampdelay and ICMP flag (encoding greylist return codes 910 or 13 as a negative sign) for a total of about 20MBper node and 6GB overall per census

A third challenge lies in the analysis of the data Fora single target the running time of [17] is O(10minus1) secwhich compares very favorably to the O(103) sec of thebrute force optimal solution at the same time pro-cessing a census would still take days (we indeed have

1While it is possible to more finely tune the probingrate per VP however coverage may benefit from samplescoming from the slowest VP especially if it resides in ageographical area which is not well covered by PL

stopped processing the complete Census-0 after 3 daysof CPU time where textual format additionally led toslow processing due to disk fragmentation) Moreoverdue to LFSR the order of the target IPs in all filesis not the same meaning that an on-the-fly sorting ofabout 300 lists (one per VP) containing millions targetsis needed We therefore optimized our implementationwhich currently runs in under three hours ie aboutthe same timescale of the census duration so that inprinciple we could perform a continuous analysis Whilethis is not interesting for the anycast characterizationuse-case it may become relevant for other applicationsof this technique (eg BGP hijacking inference men-tioned in Sec 5)

4 ANYCAST 0 CENSUSESThis section presents results of the first Internet-wide

anycast study We start by aggregated statistics (Sec 41)and then incrementally refine the picture by providinga birdrsquos-eye view of the most interesting deployments(Sec 42) over which we perform an additional portscancampaign to reveal their running services (Sec 43)

41 At a glanceDetails about the of our censuses are reported in Fig 10Overall 1696 IP24 belonging to 346 ASes appear tohave more than one anycast replica while we were ableto find only 897 IP24 belonging to 100 ASes havingat least 5 replicas with our technique The plot alsoshows a geographical density map of anycast replicasresults of our censuses are available for browsing at [21]offering per-deployment (as in Fig 5) or aggregated (asin Fig 10) visualizations Notice that results reportedin this paper correspond to censuses performed duringMarch 2015 with later censuses we observed small butinteresting changes in the anycast landscape While weplan to run a continuous service please be advised that(at time of writing) results at [21] refer to the censusesdescribed in this paper

Several remarks from Fig 10 are in order First no-tice that our results are conservative since (i) in regionswith low presence of PlanetLab VPs we may miss someanycast replicas eg when the BGP prefix is only lo-cally advertised (ii) the analysis technique provides alower bound on the number of replicas since overlap-ping disks may correspond to different anycast replicasbut they will not be considered in the solution of theMIS problem (recall Fig 3) Second we investigate theCAIDA AS rank list to cross check how many ASes us-ing IP-anycast figure in the top-100 results tabulatedin Fig 10 show that 19 IP24 of 8 ASes that play acentral role in the Internet belong to the list Similarlywe investigate the Alexa rank to cross check how manywebpages in the top-100k rank are hosted2 by ASes us-

2For the sake of simplicity we resolve the domain name

IP24 ASes Cities CC ReplicasAll 1696 346 77 38 13802ge 5 Replicas 897 100 71 36 11598

cap CAIDA-100 19 8 30 18 138cap Alexa-100k 242 15 45 29 4038

Figure 10 Anycast censuses results at a glance

ing IP-anycast here again we find 242 IP24 of 15ASes that are among the major players of the Web

42 Top-100 Anycast ASesAlbeit the amount of anycast IP24 may seem de-

ceiving at first in reason of its exiguous footprint it isnevertheless very rich ndash revealing silver needles in thehaystack From the very coarse cross-check of CAIDAand Alexa ranks we already expect that anycast usageis not only restricted to DNS but rather covers impor-tant ISPs and OTTs Fig 9 presents a birdrsquos-eye viewof anycast adoption depicting several information forthe 100 ASes for which we detected at least 5 replicasidentified by their WHOIS name reported in the x-axis(capped to 12 characters) Geographical and IP24 foot-print are reported in the bottom ASes are arranged leftto right in decreasing number of replicas (bottom bar-plot with standard deviation across IP24 belongingto the same AS) additionally reporting the number ofanycast IP24 for that AS (middle bar-plot) Servicefootprint is correlated to the open TCP ports in the AS(middle scatter-plot) Next the relative importance ofthe AS in the Internet and for the Web are expressedin terms of the CAIDA and Alexa ranks respectively(top scatter-plots) Finally a label reported on the topx-axis categorize the main activity of the ASes from abusiness perspective (category is informal and in case ofASes with multiple services only the most prominent isselected)

Big fishes Major players of the Internet ecosystemare easy to spot in Fig 9 The list includes not onlytier-1 and other ISPs (such as ATampT Services Tinet

of the frontpage found in Alexa to an IP and disregardcontent that is referenced in the frontpage

0

10

20

1 C

LO

UD

FL

AR

EN

ET

2 IS

C-A

SU

S3 H

UR

RIC

AN

EU

S4 C

DN

ET

WO

RK

SU

S-

5 F

AC

EB

OO

KU

S6 C

OM

MU

NIT

YD

NS

7 X

GT

LD

US

8 L

-RO

OT

US

9 M

ICR

OS

OF

TU

S10 I-R

OO

TS

E11 V

ER

ISIG

N-I

NC

12 L

LN

WU

S13 A

RY

AK

A-A

RIN

14 A

PP

LE

-EN

GIN

E15 C

ED

EX

ISU

S16 H

IGH

WIN

DS

3U

17 N

ET

NO

D-I

XS

E18 O

PE

ND

NS

US

19 W

OO

DY

NE

T-1

U20 L

GT

LD

US

21 L

IEC

HT

EN

ST

EI

22 F

AS

TL

YU

S23 C

AC

HE

NE

TW

OR

K24 IN

ST

AR

TU

S25 D

NS

CA

ST

-AS

U26 G

OO

GL

EU

S27 E

DG

EC

AS

T-I

R

28 U

MD

NE

TU

S29 D

YN

DN

SU

S30 N

SO

NE

US

31 E

AS

YL

INK

4U

S32 Y

AH

OO

-AN

2U

S33 U

LT

RA

DN

SU

S34 O

VH

FR

35 L

IEC

HT

EN

ST

EI

36 A

S-A

FIL

IAS

1

37 A

UT

OM

AT

TIC

U38 T

INE

T-B

AC

KB

O39 A

BO

VE

NE

T-C

US

40 A

MA

ZO

N-0

2U

S41 C

WG

B42 L

EV

EL

3U

S43 E

DG

EC

AS

TU

S44 T

WIT

TE

R-N

ET

W45 IN

CA

PS

UL

AU

S46 A

GT

LD

US

47 A

US

RE

GIS

TR

Y-

48 C

EN

TR

AL

NIC

-A49 C

OG

EN

T-2

149

50 H

GT

LD

US

51 H

IGH

WIN

DS

4U

52 K

-RO

OT

-SE

RV

E53 N

ET

RIP

LE

X01

54 O

MN

ITU

RE

US

55 S

OF

TL

AY

ER

US

56 W

AN

GS

U-U

SU

S57 Y

AH

OO

-FC

US

58 B

ITG

RA

VIT

YU

59 A

BIL

EN

EU

S60 A

DV

AN

-CA

ST

U61 A

SA

TTL

DSE

62 A

S-Q

UA

DR

AN

ET

63 A

S6453U

S64 A

TT

EU

65 C

EN

TR

AL

NIC

-A66 C

EN

TU

RY

LIN

K-

67 C

ON

EX

IM-A

S-A

68 E

GT

LD

US

69 K

GT

LD

US

70 M

NS

-AS

NO

71 N

ICA

TA

T72 V

ITA

L-D

NS

US

73 W

HS

-AN

YC

AS

T-

74 Z

GT

LD

US

75 IN

TE

RN

AP

-BL

K76 N

ET

AP

P-A

NY

CA

77 S

PR

INT

LIN

KU

78 A

US

RE

GIS

TR

Y-

79 C

EN

TU

RY

LIN

K-

80 D

NS

IMP

LE

US

81 D

YN

-HC

US

82 E

AS

YL

INK

2U

S83 E

DN

SC

A84 E

SG

OB

-AN

YC

AS

85 H

OM

EP

L-A

SP

L86 L

INK

ED

INU

S87 M

AS

ER

GY

US

88 M

ED

IAM

AT

H-I

N89 M

II-2

GB

90 M

II-X

PC

US

91 P

EE

R1U

S92 P

HH

-AS

DE

93 P

RE

TE

CS

CA

94 P

RO

LE

XIC

US

95 Q

UA

NT

CA

ST

US

96 R

IMB

LA

CK

BE

RR

97 S

UP

ER

NE

TW

OR

K98 U

NO

VA

-1C

A99 V

OX

ILIT

YR

O100 Z

VO

NK

OV

A-A

S

R

ep

lica

s

1

10

100

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

IP

s2

4

5380

443

8080

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00Op

en

po

rts

100

1k

10k

100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00A

lexa

ra

nk

110

1001k

10k100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

CD

ND

NS

ISP

CD

NS

ocia

l N

etw

ork

DN

SD

NS

DN

SC

lou

dD

NS

DN

SC

DN

Clo

ud

CD

NS

ecu

rity

CD

ND

NS

Secu

rity

DN

SD

NS

DN

Su

nkn

ow

nC

DN

CD

NC

DN

DN

SC

lou

dD

NS

CD

Nu

nkn

ow

nD

NS

DN

SC

lou

d m

essag

ing

Web

Po

rtal

DN

SC

lou

du

nkn

ow

nD

NS

Blo

gg

ing

ISP

-tie

r1IS

PC

lou

dIS

PIS

P-t

ier1

CD

NS

ocia

l N

etw

ork

CD

ND

NS

DN

SD

NS

ISP

DN

SC

DN

DN

SD

NS

On

lin

e M

ark

eti

ng

Clo

ud

CD

NW

eb

Po

rtal

CD

NB

ackb

on

e N

etw

ork

un

kn

ow

nD

NS

Clo

ud

ISP

-tie

r1IS

PD

NS

ISP

-tie

r1C

lou

dD

NS

DN

SV

ideo

Co

nfe

ren

cin

gD

NS

DN

SS

ecu

rity

DN

SC

lou

dW

eb

An

aly

tics

ISP

-tie

r1D

NS

ISP

DN

SD

NS

Clo

ud

messag

ing

DN

SD

NS

Clo

ud

So

cia

l N

etw

ork

Clo

ud

AD

tech

no

log

yC

DN

CD

NC

lou

dC

DN

CD

NS

ecu

rity

Web

An

aly

tics

Tele

co

m V

en

do

rC

lou

dD

NS

Clo

ud

un

kn

ow

n

Ca

ida

ra

nk

Figure 9 Birdrsquos eye view of Top-100 anycast ASes (ranked according to geographical footprint)

Sprint TATA Communications Qwest Level 3 Hur-ricane Electrics) but also a rather large spectrum ofOTTs such as CDNs (eg CloudFlare EdgeCast) host-ing (eg OVH) and cloud providers (eg MicrosoftAmazon Web Services) social networks (eg TwitterFacebook LinkedIn) and security companies that pro-vide mitigation services against DDoS attacks (eg Pro-lexic OpenDNS) The list also includes manufactur-ers (eg Apple RIM) Web registrars (eg Verisignnicat) virtual roaming and virtual meeting services(Media Network Services) blogging platforms (Automat-tic a publishing company hosting wordPresscom) cloudmessaging (EASYLINK2 owned by ATampT Services)and web analytics (OMNITURE owned by Adobe Sys-tems) Of course DNS-related service providers suchas root and top-level domain servers (eg ISCF-rootCommunityDNS) DNS service management (eg Ul-traDNS DynDNS) and public DNS resolvers (eg GoogleDNS OpenDNS) also emerge in the censusDiversity We report a breakdown of AS classes inFig 11 crisply showing that DNS now represents aboutone third of IP anycast activities Plots in Fig 9 clearly

illustrate the large diversity of anycast usage under allmetrics Indeed no correlation appear between any twometrics illustrating the degree of freedom in anycast de-ployments for instance the geographical footprint andIP24 footprints are largely unrelated (Pearson corre-lation coefficient of 035) Additionally the number ofopen ports and the specific port values vary not onlyacross deployments having an heterogeneous businessmodel (eg we observe from a minimum of 1 openport for DNS to O(104) open ports for OVH) but alsobetween deployments of the same kind (eg CloudFlareand EdgeCast CDNs have in common only port 53 80and 443 over the set of 22 open ports they are usingwith CloudFlare using 4times more ports than EdgeCast)

Geographical footprint We specifically study themean number of geographical replicas per AS (bottomplot in Fig 9) championed by the CDN CloudFlare inour measurement Overall we observe that 25 ASeshave at least 10 replicas distributed around the globeNotice that these orders of magnitude are significantlysmaller with respect to L7-anycast deployments that

0

5

10

15

20

25

30

35

DNSCDN

CloudISP

UnknownSecurity

SocialOther

AS

bre

akd

ow

n [

]

Figure 11 Breakdown of AS category (only firstcategory is considered)

can exceed O(103) for the large providers which is inpart due to the low number of vantage points in Plan-etLab (see Sec32) Among those ASes we observe10 DNS service providers (including ISC DNScast andDynDNS) and 7 major CDNs (eg CloudeFlare Lime-light Highwinds Fastly CacheNetworks Instart LogicCDNetworks) We also discover two cloud providers(eg Microsoft and Aryaka Networks) one tier-1 ISP(Hurricane Electric which has 15 of ASes in its cus-tomer cone according to CAIDA) a security company(OpenDNS also popular for its public DNS service) asocial network (Facebook) and a manufacturer (Apple)

Fig 12 further reports the cumulative number of repli-cas per IP24 depicting both results coming from thecombination of censuses as well as individual resultfrom each census alone Specifically the MIS solverorders circles by increasing radius size intuitively thesmaller the latency the lower the number of overlapsthe better the recall of our method This is confirmedin Fig 12 where censuses are combined by computingthe minimum among multiple latency measurementsbetween the same VP and target pair to get an es-timate of the RTT latency that is as close as possibleto the propagation delay Additionally combining mea-surement increases recall about 200 more IP24 arefound to be anycast in the combination with respect tothe average individual census

In this paper we limitedly consider results from thecombination but remark that results are quite con-sistent across censuses (notice that curves overlap inFig 12) A last comment is worth making about de-ployments where we observe only 2 geographically dis-tributed replicas ndash which is possibly due to the low den-sity of our VPs but could also be tied to the wronggeolocation of some VP raising false positive replicasWhile we have anecdotal evidence of some of these ex-iguous deployments being anycast we prefer to defer amore detailed analysis for future work (see Sec 5)

0 01 02 03 04 05 06 07 08 09

1

2 5 10 15 20 25

CD

F o

f n

um

ber

of

IPs2

4

Number of replicas

Combination of censuses-308VPs(130315) Census 4-240VPs(040315) Census 3-269VPs(020315) Census 2-255VPs(270215) Census 1-261VPs

Figure 12 CDF of geographically distinct repli-cas per IP24 (individual censuses and overall)

0 01 02 03 04 05 06 07 08 09

1

1 10 100 1000

CLOUDFLARENETUS (328)

GOOGLEUS (102)

EDGECASTUS (37)

PROLEXICUS (21)

LINKEDINUS (1)

APPLEUS (6)

TWITTERUS (3)

LEVEL3US (2)C

DF

of

nu

mb

er

of

AS

es

Number of IPs24

gt= 5 Replicas

Figure 13 Number of IPs24 per AS

IP24 footprint In terms of the number of anycastIPs24 per AS (middle plot in Fig 9) we find thatthe CDN CloudFlare is by far the largest in terms ofIP address ranges Overall we find 10 ASes that haveat least 10 anycast IPs24 3 are CDNs (CloudFlareEdgeCast BitGravity) 3 are DNS providers (DNScastWoodyNet UltraDNS) and the remaining ASes rep-resent multiple services (Automattic Google AmazonWeb Services and Prolexic) The distribution of thenumber of IPs24 per AS depicted in Fig 13 shows thatabout half have exactly one IP24 (eg LinkedIn andATampT Services) Yet about 10 of the ASes employat least 10 subnets for instance Prolexic EdgeCastGoogle and CloudFlare employ 21 37 102 and 328anycast IP24 respectively

While in this work we do not provide a systematicinvestigation of the deployment density (ie how manyIP32 are alive in each IP24) from the above discus-sion about diversity is not surprising that we were ableto identify both very sparse (eg Google 8888 is theonly address alive in the 888024) and very dense de-ployments (eg well over 99 of IPs are alive in mostCloudFlare subnets)

Importance The presence of ASes ranking amongthe top-100 in the CAIDA list as well as CDNs servingcontent in the top-100k Alexa list are good indicatorsthat anycast is used for popular and important servicesConsidering CDNs that are after DNS the most popu-lar anycast service according to Fig 11 we observe that8 CDNs serve Alexa-100k websites this set includesCloudFlare EdgeCast and Fastly with 188 10 and5 websites respectively (in addition Highwinds Cach-eNetworks Instart Incapsula and BitGravity host onepopular site each) In addition 11 of the websites listedby Alexa are hosted by Google anycast IPs Finally10 websites are hosted on IPs that belong to Prolexic(now part of Akamai) which operates a DDoS mitiga-tion service that receives the traffic on behalf of its clientnetworks redirecting only legitimate traffic to them

43 Anycast ServicesPortscan campaign In reason of the historical use ofanycast for DNS services we believe it to be importantto provide an up-to-date longitudinal view across ser-vices offered via IP-anycast especially focusing on TCPWe provide a summary of the nmap probing in the topof Fig 14 We test all anycast 24 of the top-100 ASespicking a single IP representative per 24 we scan atlow rates all 216 TCP ports Our results are conser-vative in that different IPs in the same 24 may havedifferent open ports (which happens eg for Cloud-Flare and EdgeCast) and since an under-estimation ofthe number of open TCP ports can also be the result ofprobe filtering by firewalls and routers along the pathto the targets Out of the 897 IP of the top-100 ASeswe find that 816 of 81 ASes have at least one open TCPport The total number of distinct open TCP portsacross is 10485 providing 449 well-known services (ieas indicated by TCP port classification) 170 of whichover SSL Additionally nmap fingerprinting discovers30 different software implementations running on theanycast replicas that we also detail next

Class imbalance Given the heterogeneity of the IP24footprint we argue being necessary to consider only per-AS statistics to avoid presenting results that are biaseddue to class imbalance We illustrate the problem bydepicting in Fig 14 the frequency count of the top-10open TCP ports by number of ASes (top) and IPs24(bottom) Notice that only port 80 443 and 53 appearto be common to both top and bottom plots especiallyall ports in the hatched area are due to the large pre-dominance of IP24 owned by the CloudFlare AS whichalso affect the order of common ports in the top-10 Wethus focus on per AS statistics in the following

Stateful services Fig 15 presents the complemen-tary CDF of the number of open TCP ports per AS

nmap portscan statisticsIPs32 ASes Ports (SSL) Well Software

known812 81 10499 (185) 457 30

0

200

400

80

443

8080

53

2052

2053

2082

2083

8443

2087IP

24 f

req

uen

cy

IP2

4 f

req

uen

cy

0

200

400

htt

p

htt

p s

sl

htt

p

do

main

htt

p

htt

p

htt

p

htt

p

htt

p

htt

p

0

20

40

60

53

80

443

179

22

8080

8083

3306

1935

5252A

S f

req

uen

cy

AS

fre

qu

en

cy

0

20

40

60

do

main

htt

p

htt

p-s

sl

tcp

wra

pp

ed

ssh

htt

p

tcp

wra

pp

ed

tcp

wra

pp

ed

sip

mo

vaz-s

sc

Figure 14 Overall nmap portscan statistics andTop-10 open TCP ports (per AS and per 24)

We make the following observations (i) roughly half ofthe ASes have at least one open TCP port (ii) about10 of the ASes have at least 5 open TCP ports and(iii) the largest service footprint is represented by Incap-sula and especially OVH with 313 and 10148 open portsrespectively In the latter case while we did not inves-tigate thoroughly we suspect the large number of portsbeing due to the fact that OVH the largest hosting ser-vice in Europe and the 3rd in the world is significantlypopular in the BitTorrent seedbox ecosystem [42] Pre-dominant services (beyond DNS) include fairly popu-lar HTTP and HTTPS used by over 20 of the ASesEven excluding the OVH case the list of interestingservices is large In terms of business diversity 22 ASeshave at least 4 different TCP ports open 8 CDNs 4DNS 4 ISPs including a tier-1 ISP (Tinet SpA) andGoogle with 9 open TCP ports Finally interesting(though unpopular) services worth listing include multi-media services (RTMP Simplify Media MythTV) andgaming (Minecraft)

Software diversity Fig 16 lists 30 different soft-ware that we group into three main categories WebMail and DNS Interestingly the list includes opensource software such as popular web and DNS daemons(eg nginx ISC BIND) and proprietary software (egECAccECSECD which are web servers developed byEdgeCast) Starting with DNS software notice that for44 ASes using port 53 (out of 67) nmap could not iden-tify the software version running on the remote serverUnsurprisingly we find that ISC BIND is by far themost adopted protocol to handle DNS requests overanycast Yet we also detect the use by 3 ASes (AppleK-ROOT L-ROOT) of the NLnet Labs NSD implemen-

001

01

1

1 10 100 1000 10000 100000

OVHFR (10148)

INCAPSULAUS (330)

CLOUDFLARENETUS (20)

GOOGLEUS (9)

EDGECASTUS (5)C

CD

F

Number of open TCP ports

ASes

Figure 15 Complementary CDF of the numberof open TCP ports per AS

tation which is specifically designed to add resilienceagainst software failures of DNS root servers

Among web servers the most popular are nginx (7ASes) Apache httpd and lighttpd (ex aeligquo with 4ASes) We observe the use of proprietary web serversby some CDNs (eg cloudflare-nginx and Panel httpd)Though our dataset has a limited size we attempt acomparison with the relative popularity of webserversin the unicast world the Spearman correlation of popu-larity rank in our dataset with webserver ranks [2] in theAlexa-10M is low (038) As for the DNS case differencemay arise in some peculiar features that are especiallyvaluable in the anycast context Finally we detect thepresence of running daemons that serve mail on anycastIPs from Google (Gmail imapd Gmail pop3d gsmtp)as well as of RPC (ssh MicrosoftRPC) and databases(MySQLMicrosoft SQL)

5 DISCUSSIONWe present the first census(es) of IPv4 anycast de-

ployment gathered through an original and robust tech-nique implemented with an efficient and scalable systemdesign In spirit with the open source and data move-ment results of our census are available at [21]

Our characterization show that a tiny fraction of theIPv4 space is anycasted yet among the anycasters werecognize major players of the Internet ecosystem in-cluding top-ranking ISPs popular Cloud OTT and es-pecially CDN operators We additionally show greatheterogeneity along multiple directions and especiallyin terms of the offered services Particularly our portscancampaign of anycasted subnets reveals over 450 well-known services from over 10 000 unique open portsAdditionally we uncover 30 software implementationswith a relative breakdown that differs from softwareranking in the unicast IP world

Yet this work only scratches the surface and opensmore questions than it is able to answer as for instance

0

10

20

ISC

BIN

DN

Ln

et

Lab

s N

SD

Mic

roso

ft D

NS

Op

en

DN

S

ng

inx

lig

htt

pd

Ap

ach

e h

ttp

dE

CD

Mic

roso

ft H

TT

P

Mic

roso

ft IIS

Varn

ish

Ap

ach

e T

om

cat

bit

asic

v2

CF

S 0

213

clo

ud

flare

-ng

inx

cP

an

el h

ttp

dth

ttp

dE

CA

ccE

CS

Go

og

le h

ttp

din

sta

rt1

60

Gm

ail im

ap

dG

mail p

op

3d

Go

og

le g

sm

tp

Op

en

SS

HM

yS

QL

ssls

trip

Mic

roso

ft S

QL

M

icro

so

ft R

PC

DNS Web Mail Other

AS

fre

qu

en

cy

Figure 16 Breakdown of software running onanycast replicas

IP-level CDN Refining the active methodology bymapping content object (and not only frontpage) fromthe Alexa-100k would be needed to gather a better un-derstanding of IP-level CDNLongitudinal view Taking periodic censuses and an-alyzing the time evolution over longer timescales wouldallow to track evolution of IP anycast deploymentsTraffic volume A missing information concerns thetraffic volume served by IP anycast that can be gath-ered via passive measurement and annotated with re-sults of our census (ie binary flag per anycast IP24)Combine measurement platforms As we have seenit would be interesting to exploit multiple platforms inaddition to PlanetLab such as RIPE Atlas this wouldboth lead to a better characterization of large deploy-ments (eg increase the recall) as well as possibly assistin confirmingdiscarding suspicious deployments (iethose for which we detected 2 replicas from PL)BGP hijacking Detecting geo-inconsistencies for know-ingly unicast prefixes is symptomatic of BGP hijack-ing attacks being able to periodically and quickly scanthe network to raise alarms and cross-check them withother types of data (eg BGP feeds traceroute mea-surements) is a relevant extension of this work

AcknowledgementsWe thank our shepherd Matteo Varvello and the anony-mous CoNEXT reviewers for their valuable feedbackThis work has been carried out at LINCS (httpwwwlincsfr) The research leading to these results was sup-ported by the European Union under the FP7 GrantAgreement n 318627 (Integrated Project rdquomPlanerdquo)and by a Google Faculty Research Award

6 REFERENCES[1] httpinternetcensus2012bitbucketorgpaperhtml

[2] httpw3techscomtechnologiesoverviewweb serverall

[3] J Abley A software approach to distributing requestsfor DNS service using GNU Zebra ISC BIND 9FreeBSD In Proc USENIX ATEC 2004

[4] J Abley and K Lindqvist Operation of AnycastServices RFC 4786 (Best Current Practice) 2006

[5] V K Adhikari S Jain and Z li Zhang Youtubetraffic dynamics and its interplay with a tier-1 isp Anisp perspective In ACM IMC 2010

[6] httpwwwcaidaorgprojectsark

[7] httpwwwroot-serversorg

[8] F Baker Requirements for IP Version 4 RoutersIETF RFC 1812 1995

[9] H Ballani and P Francis Towards a global IP anycastservice In Proc ACM SIGCOMM 2005

[10] H Ballani P Francis and S Ratnasamy Ameasurement-based deployment proposal for IPanycast In ACM IMC 2006

[11] B Barber M Larson and M Kosters Traffic sourceanalysis of the J root anycast instances Nanog 2006

[12] I Bermudez S Traverso M Mellia and M MunafoExploring the cloud from passive measurements TheAmazon AWS case In Proc IEEE INFOCOM 2013

[13] P Boothe and R Bush DNS Anycast Stability SomeEarly Results CAIDA 2005

[14] R Braden Requirements for Internet Hosts -Communication Layers IETF RFC 1122 1989

[15] M Calder X Fan Z Hu E Katz-BassettJ Heidemann and R Govindan Mapping theexpansion of googlersquos serving infrastructure In ACMIMC 2013

[16] M Calder A Flavel E Katz-Bassett R Mahajanand J Padhye Analyzing the performance of ananycast cdn In ACM IMC 2015

[17] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman A fistful of pings Accurateand lightweight anycast enumeration and geolocationIn Proc IEEE INFOCOM 2015

[18] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman Latency-based anycastgeolocalization Algorithms software and datasets InTech Rep 2015

[19] L Colitti Measuring anycast server performance Thecase of K-root Nanog 2006

[20] C Contavalli W van der Gaast S Leach andE Lewis Client Subnet in DNS Queries httpstools

ietforghtmldraft-ietf-dnsop-edns-client-subnet-04

[21] httpwwwenstfr˜drossianycast

[22] Z Durumeric E Wustrow and J A HaldermanZmap Fast internet-wide scanning and its securityapplications In USENIX Security Symposium 2013

[23] B Eriksson P Barford J Sommers and R NowakA learning-based approach for IP geolocation InPAM 2010

[24] B Eriksson and M Crovella Understandinggeolocation accuracy using network geometry In ProcIEEE INFOCOM 2013

[25] X Fan J S Heidemann and R GovindanEvaluating anycast in the domain name system InProc IEEE INFOCOM 2013

[26] httpwwwict-mplaneeupublicfastping

[27] A Flavel P Mani D A Maltz N Holt J LiuY Chen and O Surmachev Fastroute A scalableload-aware anycast routing architecture for moderncdns In Proc USENIX NSDI 2015

[28] B Gueye A Ziviani M Crovella and S FdidaConstraint-based geolocation of internet hosts InACM IMC 2004

[29] T Hardie Distributing Authoritative Name Serversvia Shared Unicast Addresses IETF RFC 3258 2002

[30] J Heidemann Y Pradkin R GovindanC Papadopoulos G Bartlett and J BannisterCensus and survey of the visible internet In ACMIMC 2008

[31] Internet addresses hitlist dataset (20140829rev4338)Provided by the USCLANDER project(httpwwwisieduantlander)

[32] D Karrenberg Anycast and BGP stability A closerlook at DNSMON data Nanog 2005

[33] T Krenc O Hohlfeld and A Feldmann An internetcensus taken by an illegal botnet A qualitativeassessment of published measurements ACM CCR44(3) 2014

[34] Z Liu B Huffaker M Fomenkov N Brownlee andK C Claffy Two days in the life of the DNS anycastroot servers In PAM 2007

[35] D Madory C Cook and K Miao Who are theanycasters Nanog 2013

[36] B M Maggs and R K Sitaraman Algorithmicnuggets in content delivery SIGCOMM ComputCommun Rev 45(3) Jul 2015

[37] K Miller Deploying IP anycast Nanog 2003

[38] httpsnmaporg

[39] httpswwwopendnscomdata-center-locations

[40] C Pelsser L Cittadini S Vissicchio and R BushFrom paris to tokyo On the suitability of ping tomeasure latency In ACM IMC 2013

[41] I Poese S Uhlig M A Kaafar B Donnet andB Gueye IP geolocation databases Unreliable ACMCCR 41(2) 2011

[42] D Rossi G Pujol X Wang and F Mathieu Peekingthrough the BitTorrent seedbox hosting ecosystem InProc Traffic Monitoring and Analysis (TMA) 2014

[43] S Sarat V Pappas and A Terzis On the use ofanycast in DNS In Proc ACM SIGMETRICS 2005

[44] Y Shavitt and N Zilberman A geolocation databasesstudy IEEE J-SAC 29(10) 2011

[45] F Streibelt J Bottger N Chatzis G Smaragdakisand A Feldmann Exploring edns-client-subnetadopters in your free time In ACM IMC 2013

[46] A Su D R Choffnes A Kuzmanovic andF Bustamante Drafting behind akamai(travelocity-based detouring) In Proc ACMSIGCOMM 2006

[47] R Torres A Finamore J R Kim M Mellia M MMunafo and S Rao Dissecting video server selectionstrategies in the YouTube CDN In Proc of IEEEICDCS 2011

[48] S Zander L L Andrew and G Armitage Capturingghosts Predicting the used ipv4 space by inferringunobserved addresses In ACM IMC 2014

  • Introduction
  • Methodology Overview
    • Workflow
    • State of the art
      • System design
        • Census targets
        • Measurement dataset vs platform
        • Measurement software
        • Network protocol
        • Scalability
          • Anycast 0 censuses
            • At a glance
            • Top-100 Anycast ASes
            • Anycast Services
              • Discussion
              • References
Page 4: Characterizing IPv4 Anycast Adoption and Deployment€¦ · Characterizing IPv4 Anycast Adoption and Deployment ... CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM

estimate the number of IP-anycast replicas as we use amere O(102) vantage points

Still our results provide a broad conservative yetaccurate picture of Internet anycast usage for targetsfor which we have the ground truth our city-level ge-olocation is accurate in about 75 of the cases with amedian error of 350 Km otherwise (Sec 34)

Typical census In this work we perform four IPv4censuses and analyse the results obtained from theircombination For each of the O(102) VPs the magnitudeof a typical census is illustrated in Fig 4 starting froma hitlist of O(107) targets less than half send a reply(Sec 31) ICMP replies include O(105) errors some ofwhich relates to administratively prohibited communi-cation senders of these ICMP error messages are addedto a greylist to avoid probing them again in future cen-suses (Sec 33) Finally running the anycast geoloca-tion technique over the O(106) targets that generatevalid ICMP echo reply messages we discover roughlyO(103) IP24 anycast deployments corresponding toapproximately 01h of the whole IPv4 address space ndashthe proverbial needle in the IPv4 haystack

22 State of the artAnycast vs Unicast In standard IP unicast censusesthe set of targets can be split among VPs for scalabilityreasons In contrast in the anycast case all targetsshould be probed by all VPs to provide an accuratemap of geographical footprints Given that the numberof active VPs in PL is around 300 and that only oneIP32 target for each IP24 subnet needs to be probedin a given anycast census it follows that the raw amountof probe traffic is only slightly larger than that of anunicast censuses

In the unicast case the relative location of VPs withrespect to targets is irrelevant Therefore census strate-gies cover extremes such as a single centralized high-endserver capable of O(106) probes-per-second on a wellprovisioned 1 Gbps Ethernet connection as in Zmap [22]or a highly decentralized system that exploits O(105)low-resource gateways as in the illegal Carna Botnetcensus [1] Our system design sits halfway betweenthese two extremes in particular it uses an efficientapplication-level multi-threaded scanner capable of O(104)probessec distributed over O(102) PlanetLab nodes

It is also worth pointing out that an independentanalysis [33] of the Carna Botnet dataset found multiplecampaigns covering a cumulative number of probes ex-ceeding a full IPv4 census over a duration of 8 monthsAdditionally [33] suggests that due to an overlap in thetarget set not all hosts were probed neither during thetwo fast scan campaign identified nor during the wholemeasurement period As such our measurement cam-paign comprising multiple anycast censuses each lasting

only few hours constitutes an achievement per se

Geolocation and infrastructure mapping Uni-cast geolocation is a well investigated research topicNumerous techniques based on latency measurements[23 24 28] and databases [41 44] have been proposedYet database techniques are not only unreliable withunicast [41] but also with anycast since they adver-tise a single geolocation per IP Similarly latency-basedtechniques [23 24 28] use triangulation and geolocateunicast addresses at the intersection of multiple latencymeasurements from geographically dispersed vantage pointsHowever this assumption no longer necessarily holds foranycast as depicted in Fig 3

L7-anycast infrastructure mapping studies [1545] lever-age ECS requests to geolocate servers (millions of) re-quests are sent with different client IPs from one VP tounveil (thousands of) unicast IP addresses correspond-ing to PoPs of major OTTs However ECS supportis becoming widespread to enhance the user online ex-perience but is not yet pervasive Finally the tech-nique fails with alternative L7-anycast design relyingon HTTP redirection

To the best of our knowledge no IP-anycast geolo-cation technique exists other than our own as suchno other study apart from this work deals with any-cast infrastucture mapping With respect to ECS-basedtechnique for L7 anycast our technique allows to re-duce the cardinality of the problem without sacrificinggeolocation accuracy (a qualitative comparison is pro-vided in [17]) Additionally since BGP provides a uni-fied redirection technique IP-anycast offers an unprece-dented opportunity to broadly assess all deployments atonce

Anycast discovery and characterization Priorwork investigates different aspects related to anycastwith a focus on discovery and enumeration [25 35] oron characterization [9ndash1119323443] but to the bestof our knowledge not on anycast census Closest toours [17] is the work of [25 35] Specifically [17] isa service-agnostic technique for detection enumerationand geolocation Similarly to [17] speed of light vio-lation is used in [35] (however limited to detection andnot capable of enumerationgeolocation) Conversely[25] exploits DNS-specificities (ie CHAOS requests)to enumerate DNS replicas (but unlike [17] is neithercapable of geolocation nor applicable beyond DNS)

Other studies assess the performance of current IPanycast deployments with a focus on metrics such asproximity [910193443] affinity [9ndash11133443] avail-ability [103243] and load-balancing [1011] Yet thesestudies focus on DNS which is just a piece of the cur-rent anycast puzzle

Finally some work study IP-anycast CDN such as [16

27] However these focused studies add yet other usefulpieces to the puzzle that remained so far incompletelacking the broad perspective given by a Internet-widecoverage over all prefixes and services

3 SYSTEM DESIGNAnycast detection relies on measuring round trip de-

lays between a set of vantage points and a target IP ad-dress to uncover geo-inconsistencies Running an Inter-net census thus requires measurements towards millionsof destinations ideally in a short timeframe we nowdescribe and justify system design choices that allow usto perform multiple censuses that we analyze later inSec 4 Items discussed in this section concern the se-lection of targets (Sec 31) the measurement platform(Sec 32) and software (Sec 33) as well as the networkprotocol used (Sec 34) Finally we report considera-tions about the scalability of our workflow (Sec 35)

31 Census targetsCensus granularity Unlike multicast anycast ad-dresses need no reservation into the IP space as any IPaddress can be a candidate this makes deployment easybut the detection of anycast addresses hard Luckily toavoid a significant increase in the size of routing tablesBGP standard practice [4] is to ignore or block prefixesshorter that 24 Thus 24 is the minimum granular-ity for anycasted services which is a good granularityfor our census We validate this assumption with (spot)verifications for all IP addresses on some IP24 (belong-ing to EdgeCast) confirming any IP in the 24 to beequivalent for anycast detection purposes Additionallya 24 granularity implies that announced BGP prefixesthat are smaller than 24 are tested multiple times oneper each 24 they contain the mapping between 24and announced prefixes is still possible a posteriori aswe do in this work This choice is reinforced by [35]which found 88 of announced prefixes to be 24 statedthat ldquoanycast prefixes are dominated by 24rdquo and sug-gested that larger prefixes may be anycast only in partdue to BGP prefix aggregation We therefore fix thecensus granularity to a single target IP per 24

Target liveness As previously argued any alive IPbelonging to a 24 is equivalent in telling whether thewhole 24 is anycast (or unicast) To identify a respon-sive IP address in every 24-prefix we rely on the hitlistperiodically published by [31] The hitlist consists ingenerally one representative IP address for O(107) pre-fixes along with a score indicative of the host livelinesscomputed over several measurement campaigns Whenno alive IP has been observed in a 24 the hitlist con-tains an arbitrary address from that 24 (score leminus2)After covering the full hitlist with the first census weconfirm these hosts not being reachable and remove

Figure 5 Microsoft deployment as seen fromPlanetLab (21 replicas) vs RIPE (54 replicas)Notice that PlanetLab results (white markers)are a subset of RIPE (white and black markers)

them to reduce the target size to 66 middot 106 per VP

Coverage Given our census aim we verify how wellthis hitlist covers all routed 24 prefixes We thereforeobtain from CAIDA a dump of routing tables originat-ing from both RIPE RIS and RouteViews collectorsTo compare the hitlist vs the advertised prefixes wesplit the latter in 24 obtaining 10616435 24 pre-fixes of which 10615563 have a representative in thehitlist (over 9999 coverage) We additionally cross-check our observed target responsiveness with the ex-pected recall specifically recent ICMP scans [48] ob-serve 49 middot106 used 24 subnets and our campaigns simi-larly capture 44 middot106 responsive subnets (90 coveragewith respect to [48])

32 Measurement dataset vs platformDataset One option to avoid running a large scalemeasurement campaign is to exploit readily availabledatasets from public measurement infrastructures ndash yetwe could not find any fitting our purpose For instancedespite probing all 24 every 2-3 days Archipelago [6]clusters its vantage points into three independent groupseach using random IPs selected in each 24 prefix itfollows that at most 3 monitors target each 24 withgenerally different IP addresses and a hit rate of about6 Given the low hit-rate and low-parallelism suchdataset is not appropriate for our purpose as it wouldnot lead to a complete census nor to an accurate geolo-cation footprint even in case of hits

Platforms There are a number of available measure-ment platforms in the community each with its ownadvantages and limitations Except for illustration pur-poses in this paper we relied on PlanetLab (PL) WhileRIPE Atlas (RIPE for short) is more interesting for ge-ographical diversity due to its scale it has a limitedcontrol on the rate and type (cf Sec 34) of measure-

ments as well as their instantiation for such a largescale campaign (ie upload of the hitlist probing bud-get) Additionally the larger number of vantage pointswould mechanically increase about 20-fold the amountof probes per census with respect to PL (in case allVPs are used) Conversely measurement in PL are lim-ited by node availability (generally around 300 vantagepoints) but offer full flexibility for deploying customsoftware and run it at high speed (cf Sec 33)

While in this work we limitedly use PL we depictfor illustration purposes an application of our techniquefrom measurements collected from PlanetLab vs RIPEin Fig 5 as PL results are a subset of RIPE resultswhite markers indicate replicas found from both plat-forms while black markers pinpoint replicas that areonly found with RIPE measurements While this ex-ample has anecdotal relevance it suggests that an in-triguing direction is to combine both platforms eg byrefining via RIPE the geolocation of anycast 24 de-tected via PL

33 Measurement softwareFastping An efficient measurement tool is needed tomaximize the probing capacity of our VPs While atfirst sight Zmap [22] could seem the perfect tool forsuch large-scale campaign it however exhibits a majorblocking point in our setup namely Zmap generatesraw Ethernet frames which are very efficient in a localsetup but are not supported by the PlanetLab virtu-alization layer We therefore resort to Fastping [26] atool specialized as the name implies in ICMP scan-ning which is deployed on each PL node Fastping isable to send about O(104) probes per second ndash abouttwo orders of magnitude slower than Zmap but fasterthan the fastest nmap scripting engine scanner As wewill point out later concerning scalability (Sec 35) inorder to gather complete censuses in few hours we hadto undergo several rounds of re-engineering ndash includingpurposely slowing down Fastping sending-rate

Greylist Additionally Fastping adopts the usual tech-niques to be a good Internet measurement citizen ndash iea signature in the payload points to its homepage Fast-ping probes the target list in a randomized order toreduce intrusiveness and implements a greylist mecha-nism to honor requests to stop probing administrativelyprohibited hostsnetworks inferred from ICMP returncodes Before running a census from O(102) VP we ini-tially run a census from a single VP in order to buildan initial blacklist During any census we then collectaddresses generating ICMP return codes (other thanecho reply) in a temporary greylist that we later in-crementally merge with the the blacklist This list hasapproximately O(105) hosts with 985 added due toadministrative filtering [8] (type 3 code 13) and the re-

0

20

40

60

80

100

ICMP TCP-53 TCP-80 DNSUDP DNSTCP

L3 L4 L7

Resp

on

se r

ati

o [

]

OpenDNS Edgecast Cloudflare Microsoft

Figure 6 Response rates seen by heterogeneousprotocols across different targets

0

025

05

075

1

GTPAI TPR Error[Km]0

250

500

750

1000

GT

PA

I an

d T

PR

Med

ian

err

or

[Km

]CloudFlare EdgeCast

Figure 7 Validation with CloudFlare and Edge-Cast ASes Bars represent standard deviationamong IP24 of the AS

maining in reason of communications administrativelyprohibited at network or host levels (respectively 13code 10 [14] and 02 code 9 [14])

34 Network protocolRecall ICMP has often been used (and misused) inmeasurement studies especially given recent work show-ing that ICMP latency measurements are often not re-liable [40] we thus need to confirm the validity of ourprotocol selection A major motivation for ICMP mea-surement is given by the high recall it offers [48] Con-sider indeed that TCP and UDP measurements wouldneed an a priori knowledge (or guess) of services run-ning on the target under test We therefore perform atest on a reduced set of targets performing 100 mea-surements with different protocols specifically we con-sider network L3 (ICMP) and transport L4 (TCP SYN-SYNACK pair in the three-way handshake to port53 or 80) measurements as well as L7 (DNSUDP vsDNSTCP using dig) measurements Fig 6 shows thatprotocols other than ICMP have a binary recall inother words they work well only if the service is knowna priori Conversely ICMP is the only reliable alterna-tive yielding high recall across all deployments and isthus well suited for censuses

Accuracy While our technique relies on latency mea-surements it leverages the discriminative power of sidechannel information (ie cities population within disks)to cope with latency measurement noise While we vali-date the accuracy of the methodology for DNS in [17] avalidation for stateful TCP connections is still missingTo do so we build a ground truth (GT) for CloudFlareand EdgeCast by performing HTTP measurements withcurl from PL note that HTTP measurements are notavailable from RIPE highlighting once more the com-plementarity of these platforms

By inspection of the HTTP headers we find thatCloudFlare (EdgeCast) encode geolocation of the replicain the custom CF-RAY (standard Server) header fieldNotice that the measured GT constitutes the upper-bound of what can be possibly achieved from PL mea-surements while the publicly available information (PAI)displayed on the CloudFlare and EdgeCast websites con-tains a super-set of locations with respect to those mea-sured from PL We contrast true positive (TPR) clas-sification of our census vs HTTP GT in Fig 7 in 77of the IP24 for CloudFlare (65 for EdgeCast) thereis agreement at city level with a median error of 434Km (287 Km for EdgeCast) in the (relatively few) mis-classification cases As expected the low number of PLnodes possibly limits the portion of discoverable repli-cas (GTPAI is fairly high for CloufFlare but fairly lowfor EdgeCast) making our footprint estimates conser-vative and confirming the interest for alternative plat-forms such as RIPE

Consistency Additionally in the case of openDNSwe verify consistency across multiple RTT latency mea-surement techniques used early in Fig 6 In this case werely on public information that maps 24 locations [39]For all protocols applying [17] on the dataset yieldsbetween 15 and 17 instances Notice that all cities re-turned by the analysis are correct except Philadelphia(while the server is located in Ashburn at 260km or26ms worth of propagation delay away) this misclas-sification is due to the bias enforced in [17] toward citypopulation (Philadelphia is 33 times more populatedthan Ashburn) but as observed in [15] this is not prob-lematic as the ldquophysicalrdquo Ashburn location is actuallyserving the ldquologicalrdquo Philadelphia population

35 ScalabilityProbing rate When designing census experimentswe take care of avoiding obvious pitfalls For instancewhile we target a single host per 24 nevertheless weperform measurements from all PL nodes It followsthat each node must desynchronize to avoid hitting ICMPrate limiting (or raising alert) at the destination We doso by randomized permutation for target nodes achieved

Table 1 Textual (0) vs binary (1-4) censusesCensus ID Format Size (hosttotal) Analysis0 csv (270M 79G) gt3 days1-4 binary (21M 6G) 3 hr

0

02

04

06

08

1

1 2 4 8 16

CD

F

Completion time per VP [Hr]

Figure 8 CDF of per-vantage point completiontime over all censuses

via a Linear Feedback Shift Register (LFSR) with Ga-lois configuration Still while the LFSR solves rate lim-iting at the target it does not solve problems at thesource (or in the network) indeed while requests arewell spread replies do aggregate close to the VP thatreceives an aggregate rate equal to the probing rate ofFastping (in excess of 10000 hosts per second) In ourpreliminary (and incomplete) censuses we noted het-erogeneous (and possibly very high) drop rates for someVPs (likely tied to rate limiting spatially close to theVP) Given that the networks where PL machines arehosted are independently administered we opted for asimple solution and slowed down Fastping by one orderof magnitude1 that we verified empirically not trigger-ing the above problems Consequently probing 66 middot106

targets at 103 targets per second takes less than twohours as shown in Fig8 about 40 of PL nodes com-plete within this timeframe and 95 in under 5 hours(longer duration likely due to load on the PL host)

Output size and analysis duration A second scal-ability issue concerns the output format We initiallyoverlooked this issue and logged in textual format awealth of information amounting to 270M per node and80GB overall per census (cf Tab1) We therefore optedfor a radical reduction of the output size dumping astripped-down binary format containing a timestampdelay and ICMP flag (encoding greylist return codes 910 or 13 as a negative sign) for a total of about 20MBper node and 6GB overall per census

A third challenge lies in the analysis of the data Fora single target the running time of [17] is O(10minus1) secwhich compares very favorably to the O(103) sec of thebrute force optimal solution at the same time pro-cessing a census would still take days (we indeed have

1While it is possible to more finely tune the probingrate per VP however coverage may benefit from samplescoming from the slowest VP especially if it resides in ageographical area which is not well covered by PL

stopped processing the complete Census-0 after 3 daysof CPU time where textual format additionally led toslow processing due to disk fragmentation) Moreoverdue to LFSR the order of the target IPs in all filesis not the same meaning that an on-the-fly sorting ofabout 300 lists (one per VP) containing millions targetsis needed We therefore optimized our implementationwhich currently runs in under three hours ie aboutthe same timescale of the census duration so that inprinciple we could perform a continuous analysis Whilethis is not interesting for the anycast characterizationuse-case it may become relevant for other applicationsof this technique (eg BGP hijacking inference men-tioned in Sec 5)

4 ANYCAST 0 CENSUSESThis section presents results of the first Internet-wide

anycast study We start by aggregated statistics (Sec 41)and then incrementally refine the picture by providinga birdrsquos-eye view of the most interesting deployments(Sec 42) over which we perform an additional portscancampaign to reveal their running services (Sec 43)

41 At a glanceDetails about the of our censuses are reported in Fig 10Overall 1696 IP24 belonging to 346 ASes appear tohave more than one anycast replica while we were ableto find only 897 IP24 belonging to 100 ASes havingat least 5 replicas with our technique The plot alsoshows a geographical density map of anycast replicasresults of our censuses are available for browsing at [21]offering per-deployment (as in Fig 5) or aggregated (asin Fig 10) visualizations Notice that results reportedin this paper correspond to censuses performed duringMarch 2015 with later censuses we observed small butinteresting changes in the anycast landscape While weplan to run a continuous service please be advised that(at time of writing) results at [21] refer to the censusesdescribed in this paper

Several remarks from Fig 10 are in order First no-tice that our results are conservative since (i) in regionswith low presence of PlanetLab VPs we may miss someanycast replicas eg when the BGP prefix is only lo-cally advertised (ii) the analysis technique provides alower bound on the number of replicas since overlap-ping disks may correspond to different anycast replicasbut they will not be considered in the solution of theMIS problem (recall Fig 3) Second we investigate theCAIDA AS rank list to cross check how many ASes us-ing IP-anycast figure in the top-100 results tabulatedin Fig 10 show that 19 IP24 of 8 ASes that play acentral role in the Internet belong to the list Similarlywe investigate the Alexa rank to cross check how manywebpages in the top-100k rank are hosted2 by ASes us-

2For the sake of simplicity we resolve the domain name

IP24 ASes Cities CC ReplicasAll 1696 346 77 38 13802ge 5 Replicas 897 100 71 36 11598

cap CAIDA-100 19 8 30 18 138cap Alexa-100k 242 15 45 29 4038

Figure 10 Anycast censuses results at a glance

ing IP-anycast here again we find 242 IP24 of 15ASes that are among the major players of the Web

42 Top-100 Anycast ASesAlbeit the amount of anycast IP24 may seem de-

ceiving at first in reason of its exiguous footprint it isnevertheless very rich ndash revealing silver needles in thehaystack From the very coarse cross-check of CAIDAand Alexa ranks we already expect that anycast usageis not only restricted to DNS but rather covers impor-tant ISPs and OTTs Fig 9 presents a birdrsquos-eye viewof anycast adoption depicting several information forthe 100 ASes for which we detected at least 5 replicasidentified by their WHOIS name reported in the x-axis(capped to 12 characters) Geographical and IP24 foot-print are reported in the bottom ASes are arranged leftto right in decreasing number of replicas (bottom bar-plot with standard deviation across IP24 belongingto the same AS) additionally reporting the number ofanycast IP24 for that AS (middle bar-plot) Servicefootprint is correlated to the open TCP ports in the AS(middle scatter-plot) Next the relative importance ofthe AS in the Internet and for the Web are expressedin terms of the CAIDA and Alexa ranks respectively(top scatter-plots) Finally a label reported on the topx-axis categorize the main activity of the ASes from abusiness perspective (category is informal and in case ofASes with multiple services only the most prominent isselected)

Big fishes Major players of the Internet ecosystemare easy to spot in Fig 9 The list includes not onlytier-1 and other ISPs (such as ATampT Services Tinet

of the frontpage found in Alexa to an IP and disregardcontent that is referenced in the frontpage

0

10

20

1 C

LO

UD

FL

AR

EN

ET

2 IS

C-A

SU

S3 H

UR

RIC

AN

EU

S4 C

DN

ET

WO

RK

SU

S-

5 F

AC

EB

OO

KU

S6 C

OM

MU

NIT

YD

NS

7 X

GT

LD

US

8 L

-RO

OT

US

9 M

ICR

OS

OF

TU

S10 I-R

OO

TS

E11 V

ER

ISIG

N-I

NC

12 L

LN

WU

S13 A

RY

AK

A-A

RIN

14 A

PP

LE

-EN

GIN

E15 C

ED

EX

ISU

S16 H

IGH

WIN

DS

3U

17 N

ET

NO

D-I

XS

E18 O

PE

ND

NS

US

19 W

OO

DY

NE

T-1

U20 L

GT

LD

US

21 L

IEC

HT

EN

ST

EI

22 F

AS

TL

YU

S23 C

AC

HE

NE

TW

OR

K24 IN

ST

AR

TU

S25 D

NS

CA

ST

-AS

U26 G

OO

GL

EU

S27 E

DG

EC

AS

T-I

R

28 U

MD

NE

TU

S29 D

YN

DN

SU

S30 N

SO

NE

US

31 E

AS

YL

INK

4U

S32 Y

AH

OO

-AN

2U

S33 U

LT

RA

DN

SU

S34 O

VH

FR

35 L

IEC

HT

EN

ST

EI

36 A

S-A

FIL

IAS

1

37 A

UT

OM

AT

TIC

U38 T

INE

T-B

AC

KB

O39 A

BO

VE

NE

T-C

US

40 A

MA

ZO

N-0

2U

S41 C

WG

B42 L

EV

EL

3U

S43 E

DG

EC

AS

TU

S44 T

WIT

TE

R-N

ET

W45 IN

CA

PS

UL

AU

S46 A

GT

LD

US

47 A

US

RE

GIS

TR

Y-

48 C

EN

TR

AL

NIC

-A49 C

OG

EN

T-2

149

50 H

GT

LD

US

51 H

IGH

WIN

DS

4U

52 K

-RO

OT

-SE

RV

E53 N

ET

RIP

LE

X01

54 O

MN

ITU

RE

US

55 S

OF

TL

AY

ER

US

56 W

AN

GS

U-U

SU

S57 Y

AH

OO

-FC

US

58 B

ITG

RA

VIT

YU

59 A

BIL

EN

EU

S60 A

DV

AN

-CA

ST

U61 A

SA

TTL

DSE

62 A

S-Q

UA

DR

AN

ET

63 A

S6453U

S64 A

TT

EU

65 C

EN

TR

AL

NIC

-A66 C

EN

TU

RY

LIN

K-

67 C

ON

EX

IM-A

S-A

68 E

GT

LD

US

69 K

GT

LD

US

70 M

NS

-AS

NO

71 N

ICA

TA

T72 V

ITA

L-D

NS

US

73 W

HS

-AN

YC

AS

T-

74 Z

GT

LD

US

75 IN

TE

RN

AP

-BL

K76 N

ET

AP

P-A

NY

CA

77 S

PR

INT

LIN

KU

78 A

US

RE

GIS

TR

Y-

79 C

EN

TU

RY

LIN

K-

80 D

NS

IMP

LE

US

81 D

YN

-HC

US

82 E

AS

YL

INK

2U

S83 E

DN

SC

A84 E

SG

OB

-AN

YC

AS

85 H

OM

EP

L-A

SP

L86 L

INK

ED

INU

S87 M

AS

ER

GY

US

88 M

ED

IAM

AT

H-I

N89 M

II-2

GB

90 M

II-X

PC

US

91 P

EE

R1U

S92 P

HH

-AS

DE

93 P

RE

TE

CS

CA

94 P

RO

LE

XIC

US

95 Q

UA

NT

CA

ST

US

96 R

IMB

LA

CK

BE

RR

97 S

UP

ER

NE

TW

OR

K98 U

NO

VA

-1C

A99 V

OX

ILIT

YR

O100 Z

VO

NK

OV

A-A

S

R

ep

lica

s

1

10

100

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

IP

s2

4

5380

443

8080

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00Op

en

po

rts

100

1k

10k

100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00A

lexa

ra

nk

110

1001k

10k100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

CD

ND

NS

ISP

CD

NS

ocia

l N

etw

ork

DN

SD

NS

DN

SC

lou

dD

NS

DN

SC

DN

Clo

ud

CD

NS

ecu

rity

CD

ND

NS

Secu

rity

DN

SD

NS

DN

Su

nkn

ow

nC

DN

CD

NC

DN

DN

SC

lou

dD

NS

CD

Nu

nkn

ow

nD

NS

DN

SC

lou

d m

essag

ing

Web

Po

rtal

DN

SC

lou

du

nkn

ow

nD

NS

Blo

gg

ing

ISP

-tie

r1IS

PC

lou

dIS

PIS

P-t

ier1

CD

NS

ocia

l N

etw

ork

CD

ND

NS

DN

SD

NS

ISP

DN

SC

DN

DN

SD

NS

On

lin

e M

ark

eti

ng

Clo

ud

CD

NW

eb

Po

rtal

CD

NB

ackb

on

e N

etw

ork

un

kn

ow

nD

NS

Clo

ud

ISP

-tie

r1IS

PD

NS

ISP

-tie

r1C

lou

dD

NS

DN

SV

ideo

Co

nfe

ren

cin

gD

NS

DN

SS

ecu

rity

DN

SC

lou

dW

eb

An

aly

tics

ISP

-tie

r1D

NS

ISP

DN

SD

NS

Clo

ud

messag

ing

DN

SD

NS

Clo

ud

So

cia

l N

etw

ork

Clo

ud

AD

tech

no

log

yC

DN

CD

NC

lou

dC

DN

CD

NS

ecu

rity

Web

An

aly

tics

Tele

co

m V

en

do

rC

lou

dD

NS

Clo

ud

un

kn

ow

n

Ca

ida

ra

nk

Figure 9 Birdrsquos eye view of Top-100 anycast ASes (ranked according to geographical footprint)

Sprint TATA Communications Qwest Level 3 Hur-ricane Electrics) but also a rather large spectrum ofOTTs such as CDNs (eg CloudFlare EdgeCast) host-ing (eg OVH) and cloud providers (eg MicrosoftAmazon Web Services) social networks (eg TwitterFacebook LinkedIn) and security companies that pro-vide mitigation services against DDoS attacks (eg Pro-lexic OpenDNS) The list also includes manufactur-ers (eg Apple RIM) Web registrars (eg Verisignnicat) virtual roaming and virtual meeting services(Media Network Services) blogging platforms (Automat-tic a publishing company hosting wordPresscom) cloudmessaging (EASYLINK2 owned by ATampT Services)and web analytics (OMNITURE owned by Adobe Sys-tems) Of course DNS-related service providers suchas root and top-level domain servers (eg ISCF-rootCommunityDNS) DNS service management (eg Ul-traDNS DynDNS) and public DNS resolvers (eg GoogleDNS OpenDNS) also emerge in the censusDiversity We report a breakdown of AS classes inFig 11 crisply showing that DNS now represents aboutone third of IP anycast activities Plots in Fig 9 clearly

illustrate the large diversity of anycast usage under allmetrics Indeed no correlation appear between any twometrics illustrating the degree of freedom in anycast de-ployments for instance the geographical footprint andIP24 footprints are largely unrelated (Pearson corre-lation coefficient of 035) Additionally the number ofopen ports and the specific port values vary not onlyacross deployments having an heterogeneous businessmodel (eg we observe from a minimum of 1 openport for DNS to O(104) open ports for OVH) but alsobetween deployments of the same kind (eg CloudFlareand EdgeCast CDNs have in common only port 53 80and 443 over the set of 22 open ports they are usingwith CloudFlare using 4times more ports than EdgeCast)

Geographical footprint We specifically study themean number of geographical replicas per AS (bottomplot in Fig 9) championed by the CDN CloudFlare inour measurement Overall we observe that 25 ASeshave at least 10 replicas distributed around the globeNotice that these orders of magnitude are significantlysmaller with respect to L7-anycast deployments that

0

5

10

15

20

25

30

35

DNSCDN

CloudISP

UnknownSecurity

SocialOther

AS

bre

akd

ow

n [

]

Figure 11 Breakdown of AS category (only firstcategory is considered)

can exceed O(103) for the large providers which is inpart due to the low number of vantage points in Plan-etLab (see Sec32) Among those ASes we observe10 DNS service providers (including ISC DNScast andDynDNS) and 7 major CDNs (eg CloudeFlare Lime-light Highwinds Fastly CacheNetworks Instart LogicCDNetworks) We also discover two cloud providers(eg Microsoft and Aryaka Networks) one tier-1 ISP(Hurricane Electric which has 15 of ASes in its cus-tomer cone according to CAIDA) a security company(OpenDNS also popular for its public DNS service) asocial network (Facebook) and a manufacturer (Apple)

Fig 12 further reports the cumulative number of repli-cas per IP24 depicting both results coming from thecombination of censuses as well as individual resultfrom each census alone Specifically the MIS solverorders circles by increasing radius size intuitively thesmaller the latency the lower the number of overlapsthe better the recall of our method This is confirmedin Fig 12 where censuses are combined by computingthe minimum among multiple latency measurementsbetween the same VP and target pair to get an es-timate of the RTT latency that is as close as possibleto the propagation delay Additionally combining mea-surement increases recall about 200 more IP24 arefound to be anycast in the combination with respect tothe average individual census

In this paper we limitedly consider results from thecombination but remark that results are quite con-sistent across censuses (notice that curves overlap inFig 12) A last comment is worth making about de-ployments where we observe only 2 geographically dis-tributed replicas ndash which is possibly due to the low den-sity of our VPs but could also be tied to the wronggeolocation of some VP raising false positive replicasWhile we have anecdotal evidence of some of these ex-iguous deployments being anycast we prefer to defer amore detailed analysis for future work (see Sec 5)

0 01 02 03 04 05 06 07 08 09

1

2 5 10 15 20 25

CD

F o

f n

um

ber

of

IPs2

4

Number of replicas

Combination of censuses-308VPs(130315) Census 4-240VPs(040315) Census 3-269VPs(020315) Census 2-255VPs(270215) Census 1-261VPs

Figure 12 CDF of geographically distinct repli-cas per IP24 (individual censuses and overall)

0 01 02 03 04 05 06 07 08 09

1

1 10 100 1000

CLOUDFLARENETUS (328)

GOOGLEUS (102)

EDGECASTUS (37)

PROLEXICUS (21)

LINKEDINUS (1)

APPLEUS (6)

TWITTERUS (3)

LEVEL3US (2)C

DF

of

nu

mb

er

of

AS

es

Number of IPs24

gt= 5 Replicas

Figure 13 Number of IPs24 per AS

IP24 footprint In terms of the number of anycastIPs24 per AS (middle plot in Fig 9) we find thatthe CDN CloudFlare is by far the largest in terms ofIP address ranges Overall we find 10 ASes that haveat least 10 anycast IPs24 3 are CDNs (CloudFlareEdgeCast BitGravity) 3 are DNS providers (DNScastWoodyNet UltraDNS) and the remaining ASes rep-resent multiple services (Automattic Google AmazonWeb Services and Prolexic) The distribution of thenumber of IPs24 per AS depicted in Fig 13 shows thatabout half have exactly one IP24 (eg LinkedIn andATampT Services) Yet about 10 of the ASes employat least 10 subnets for instance Prolexic EdgeCastGoogle and CloudFlare employ 21 37 102 and 328anycast IP24 respectively

While in this work we do not provide a systematicinvestigation of the deployment density (ie how manyIP32 are alive in each IP24) from the above discus-sion about diversity is not surprising that we were ableto identify both very sparse (eg Google 8888 is theonly address alive in the 888024) and very dense de-ployments (eg well over 99 of IPs are alive in mostCloudFlare subnets)

Importance The presence of ASes ranking amongthe top-100 in the CAIDA list as well as CDNs servingcontent in the top-100k Alexa list are good indicatorsthat anycast is used for popular and important servicesConsidering CDNs that are after DNS the most popu-lar anycast service according to Fig 11 we observe that8 CDNs serve Alexa-100k websites this set includesCloudFlare EdgeCast and Fastly with 188 10 and5 websites respectively (in addition Highwinds Cach-eNetworks Instart Incapsula and BitGravity host onepopular site each) In addition 11 of the websites listedby Alexa are hosted by Google anycast IPs Finally10 websites are hosted on IPs that belong to Prolexic(now part of Akamai) which operates a DDoS mitiga-tion service that receives the traffic on behalf of its clientnetworks redirecting only legitimate traffic to them

43 Anycast ServicesPortscan campaign In reason of the historical use ofanycast for DNS services we believe it to be importantto provide an up-to-date longitudinal view across ser-vices offered via IP-anycast especially focusing on TCPWe provide a summary of the nmap probing in the topof Fig 14 We test all anycast 24 of the top-100 ASespicking a single IP representative per 24 we scan atlow rates all 216 TCP ports Our results are conser-vative in that different IPs in the same 24 may havedifferent open ports (which happens eg for Cloud-Flare and EdgeCast) and since an under-estimation ofthe number of open TCP ports can also be the result ofprobe filtering by firewalls and routers along the pathto the targets Out of the 897 IP of the top-100 ASeswe find that 816 of 81 ASes have at least one open TCPport The total number of distinct open TCP portsacross is 10485 providing 449 well-known services (ieas indicated by TCP port classification) 170 of whichover SSL Additionally nmap fingerprinting discovers30 different software implementations running on theanycast replicas that we also detail next

Class imbalance Given the heterogeneity of the IP24footprint we argue being necessary to consider only per-AS statistics to avoid presenting results that are biaseddue to class imbalance We illustrate the problem bydepicting in Fig 14 the frequency count of the top-10open TCP ports by number of ASes (top) and IPs24(bottom) Notice that only port 80 443 and 53 appearto be common to both top and bottom plots especiallyall ports in the hatched area are due to the large pre-dominance of IP24 owned by the CloudFlare AS whichalso affect the order of common ports in the top-10 Wethus focus on per AS statistics in the following

Stateful services Fig 15 presents the complemen-tary CDF of the number of open TCP ports per AS

nmap portscan statisticsIPs32 ASes Ports (SSL) Well Software

known812 81 10499 (185) 457 30

0

200

400

80

443

8080

53

2052

2053

2082

2083

8443

2087IP

24 f

req

uen

cy

IP2

4 f

req

uen

cy

0

200

400

htt

p

htt

p s

sl

htt

p

do

main

htt

p

htt

p

htt

p

htt

p

htt

p

htt

p

0

20

40

60

53

80

443

179

22

8080

8083

3306

1935

5252A

S f

req

uen

cy

AS

fre

qu

en

cy

0

20

40

60

do

main

htt

p

htt

p-s

sl

tcp

wra

pp

ed

ssh

htt

p

tcp

wra

pp

ed

tcp

wra

pp

ed

sip

mo

vaz-s

sc

Figure 14 Overall nmap portscan statistics andTop-10 open TCP ports (per AS and per 24)

We make the following observations (i) roughly half ofthe ASes have at least one open TCP port (ii) about10 of the ASes have at least 5 open TCP ports and(iii) the largest service footprint is represented by Incap-sula and especially OVH with 313 and 10148 open portsrespectively In the latter case while we did not inves-tigate thoroughly we suspect the large number of portsbeing due to the fact that OVH the largest hosting ser-vice in Europe and the 3rd in the world is significantlypopular in the BitTorrent seedbox ecosystem [42] Pre-dominant services (beyond DNS) include fairly popu-lar HTTP and HTTPS used by over 20 of the ASesEven excluding the OVH case the list of interestingservices is large In terms of business diversity 22 ASeshave at least 4 different TCP ports open 8 CDNs 4DNS 4 ISPs including a tier-1 ISP (Tinet SpA) andGoogle with 9 open TCP ports Finally interesting(though unpopular) services worth listing include multi-media services (RTMP Simplify Media MythTV) andgaming (Minecraft)

Software diversity Fig 16 lists 30 different soft-ware that we group into three main categories WebMail and DNS Interestingly the list includes opensource software such as popular web and DNS daemons(eg nginx ISC BIND) and proprietary software (egECAccECSECD which are web servers developed byEdgeCast) Starting with DNS software notice that for44 ASes using port 53 (out of 67) nmap could not iden-tify the software version running on the remote serverUnsurprisingly we find that ISC BIND is by far themost adopted protocol to handle DNS requests overanycast Yet we also detect the use by 3 ASes (AppleK-ROOT L-ROOT) of the NLnet Labs NSD implemen-

001

01

1

1 10 100 1000 10000 100000

OVHFR (10148)

INCAPSULAUS (330)

CLOUDFLARENETUS (20)

GOOGLEUS (9)

EDGECASTUS (5)C

CD

F

Number of open TCP ports

ASes

Figure 15 Complementary CDF of the numberof open TCP ports per AS

tation which is specifically designed to add resilienceagainst software failures of DNS root servers

Among web servers the most popular are nginx (7ASes) Apache httpd and lighttpd (ex aeligquo with 4ASes) We observe the use of proprietary web serversby some CDNs (eg cloudflare-nginx and Panel httpd)Though our dataset has a limited size we attempt acomparison with the relative popularity of webserversin the unicast world the Spearman correlation of popu-larity rank in our dataset with webserver ranks [2] in theAlexa-10M is low (038) As for the DNS case differencemay arise in some peculiar features that are especiallyvaluable in the anycast context Finally we detect thepresence of running daemons that serve mail on anycastIPs from Google (Gmail imapd Gmail pop3d gsmtp)as well as of RPC (ssh MicrosoftRPC) and databases(MySQLMicrosoft SQL)

5 DISCUSSIONWe present the first census(es) of IPv4 anycast de-

ployment gathered through an original and robust tech-nique implemented with an efficient and scalable systemdesign In spirit with the open source and data move-ment results of our census are available at [21]

Our characterization show that a tiny fraction of theIPv4 space is anycasted yet among the anycasters werecognize major players of the Internet ecosystem in-cluding top-ranking ISPs popular Cloud OTT and es-pecially CDN operators We additionally show greatheterogeneity along multiple directions and especiallyin terms of the offered services Particularly our portscancampaign of anycasted subnets reveals over 450 well-known services from over 10 000 unique open portsAdditionally we uncover 30 software implementationswith a relative breakdown that differs from softwareranking in the unicast IP world

Yet this work only scratches the surface and opensmore questions than it is able to answer as for instance

0

10

20

ISC

BIN

DN

Ln

et

Lab

s N

SD

Mic

roso

ft D

NS

Op

en

DN

S

ng

inx

lig

htt

pd

Ap

ach

e h

ttp

dE

CD

Mic

roso

ft H

TT

P

Mic

roso

ft IIS

Varn

ish

Ap

ach

e T

om

cat

bit

asic

v2

CF

S 0

213

clo

ud

flare

-ng

inx

cP

an

el h

ttp

dth

ttp

dE

CA

ccE

CS

Go

og

le h

ttp

din

sta

rt1

60

Gm

ail im

ap

dG

mail p

op

3d

Go

og

le g

sm

tp

Op

en

SS

HM

yS

QL

ssls

trip

Mic

roso

ft S

QL

M

icro

so

ft R

PC

DNS Web Mail Other

AS

fre

qu

en

cy

Figure 16 Breakdown of software running onanycast replicas

IP-level CDN Refining the active methodology bymapping content object (and not only frontpage) fromthe Alexa-100k would be needed to gather a better un-derstanding of IP-level CDNLongitudinal view Taking periodic censuses and an-alyzing the time evolution over longer timescales wouldallow to track evolution of IP anycast deploymentsTraffic volume A missing information concerns thetraffic volume served by IP anycast that can be gath-ered via passive measurement and annotated with re-sults of our census (ie binary flag per anycast IP24)Combine measurement platforms As we have seenit would be interesting to exploit multiple platforms inaddition to PlanetLab such as RIPE Atlas this wouldboth lead to a better characterization of large deploy-ments (eg increase the recall) as well as possibly assistin confirmingdiscarding suspicious deployments (iethose for which we detected 2 replicas from PL)BGP hijacking Detecting geo-inconsistencies for know-ingly unicast prefixes is symptomatic of BGP hijack-ing attacks being able to periodically and quickly scanthe network to raise alarms and cross-check them withother types of data (eg BGP feeds traceroute mea-surements) is a relevant extension of this work

AcknowledgementsWe thank our shepherd Matteo Varvello and the anony-mous CoNEXT reviewers for their valuable feedbackThis work has been carried out at LINCS (httpwwwlincsfr) The research leading to these results was sup-ported by the European Union under the FP7 GrantAgreement n 318627 (Integrated Project rdquomPlanerdquo)and by a Google Faculty Research Award

6 REFERENCES[1] httpinternetcensus2012bitbucketorgpaperhtml

[2] httpw3techscomtechnologiesoverviewweb serverall

[3] J Abley A software approach to distributing requestsfor DNS service using GNU Zebra ISC BIND 9FreeBSD In Proc USENIX ATEC 2004

[4] J Abley and K Lindqvist Operation of AnycastServices RFC 4786 (Best Current Practice) 2006

[5] V K Adhikari S Jain and Z li Zhang Youtubetraffic dynamics and its interplay with a tier-1 isp Anisp perspective In ACM IMC 2010

[6] httpwwwcaidaorgprojectsark

[7] httpwwwroot-serversorg

[8] F Baker Requirements for IP Version 4 RoutersIETF RFC 1812 1995

[9] H Ballani and P Francis Towards a global IP anycastservice In Proc ACM SIGCOMM 2005

[10] H Ballani P Francis and S Ratnasamy Ameasurement-based deployment proposal for IPanycast In ACM IMC 2006

[11] B Barber M Larson and M Kosters Traffic sourceanalysis of the J root anycast instances Nanog 2006

[12] I Bermudez S Traverso M Mellia and M MunafoExploring the cloud from passive measurements TheAmazon AWS case In Proc IEEE INFOCOM 2013

[13] P Boothe and R Bush DNS Anycast Stability SomeEarly Results CAIDA 2005

[14] R Braden Requirements for Internet Hosts -Communication Layers IETF RFC 1122 1989

[15] M Calder X Fan Z Hu E Katz-BassettJ Heidemann and R Govindan Mapping theexpansion of googlersquos serving infrastructure In ACMIMC 2013

[16] M Calder A Flavel E Katz-Bassett R Mahajanand J Padhye Analyzing the performance of ananycast cdn In ACM IMC 2015

[17] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman A fistful of pings Accurateand lightweight anycast enumeration and geolocationIn Proc IEEE INFOCOM 2015

[18] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman Latency-based anycastgeolocalization Algorithms software and datasets InTech Rep 2015

[19] L Colitti Measuring anycast server performance Thecase of K-root Nanog 2006

[20] C Contavalli W van der Gaast S Leach andE Lewis Client Subnet in DNS Queries httpstools

ietforghtmldraft-ietf-dnsop-edns-client-subnet-04

[21] httpwwwenstfr˜drossianycast

[22] Z Durumeric E Wustrow and J A HaldermanZmap Fast internet-wide scanning and its securityapplications In USENIX Security Symposium 2013

[23] B Eriksson P Barford J Sommers and R NowakA learning-based approach for IP geolocation InPAM 2010

[24] B Eriksson and M Crovella Understandinggeolocation accuracy using network geometry In ProcIEEE INFOCOM 2013

[25] X Fan J S Heidemann and R GovindanEvaluating anycast in the domain name system InProc IEEE INFOCOM 2013

[26] httpwwwict-mplaneeupublicfastping

[27] A Flavel P Mani D A Maltz N Holt J LiuY Chen and O Surmachev Fastroute A scalableload-aware anycast routing architecture for moderncdns In Proc USENIX NSDI 2015

[28] B Gueye A Ziviani M Crovella and S FdidaConstraint-based geolocation of internet hosts InACM IMC 2004

[29] T Hardie Distributing Authoritative Name Serversvia Shared Unicast Addresses IETF RFC 3258 2002

[30] J Heidemann Y Pradkin R GovindanC Papadopoulos G Bartlett and J BannisterCensus and survey of the visible internet In ACMIMC 2008

[31] Internet addresses hitlist dataset (20140829rev4338)Provided by the USCLANDER project(httpwwwisieduantlander)

[32] D Karrenberg Anycast and BGP stability A closerlook at DNSMON data Nanog 2005

[33] T Krenc O Hohlfeld and A Feldmann An internetcensus taken by an illegal botnet A qualitativeassessment of published measurements ACM CCR44(3) 2014

[34] Z Liu B Huffaker M Fomenkov N Brownlee andK C Claffy Two days in the life of the DNS anycastroot servers In PAM 2007

[35] D Madory C Cook and K Miao Who are theanycasters Nanog 2013

[36] B M Maggs and R K Sitaraman Algorithmicnuggets in content delivery SIGCOMM ComputCommun Rev 45(3) Jul 2015

[37] K Miller Deploying IP anycast Nanog 2003

[38] httpsnmaporg

[39] httpswwwopendnscomdata-center-locations

[40] C Pelsser L Cittadini S Vissicchio and R BushFrom paris to tokyo On the suitability of ping tomeasure latency In ACM IMC 2013

[41] I Poese S Uhlig M A Kaafar B Donnet andB Gueye IP geolocation databases Unreliable ACMCCR 41(2) 2011

[42] D Rossi G Pujol X Wang and F Mathieu Peekingthrough the BitTorrent seedbox hosting ecosystem InProc Traffic Monitoring and Analysis (TMA) 2014

[43] S Sarat V Pappas and A Terzis On the use ofanycast in DNS In Proc ACM SIGMETRICS 2005

[44] Y Shavitt and N Zilberman A geolocation databasesstudy IEEE J-SAC 29(10) 2011

[45] F Streibelt J Bottger N Chatzis G Smaragdakisand A Feldmann Exploring edns-client-subnetadopters in your free time In ACM IMC 2013

[46] A Su D R Choffnes A Kuzmanovic andF Bustamante Drafting behind akamai(travelocity-based detouring) In Proc ACMSIGCOMM 2006

[47] R Torres A Finamore J R Kim M Mellia M MMunafo and S Rao Dissecting video server selectionstrategies in the YouTube CDN In Proc of IEEEICDCS 2011

[48] S Zander L L Andrew and G Armitage Capturingghosts Predicting the used ipv4 space by inferringunobserved addresses In ACM IMC 2014

  • Introduction
  • Methodology Overview
    • Workflow
    • State of the art
      • System design
        • Census targets
        • Measurement dataset vs platform
        • Measurement software
        • Network protocol
        • Scalability
          • Anycast 0 censuses
            • At a glance
            • Top-100 Anycast ASes
            • Anycast Services
              • Discussion
              • References
Page 5: Characterizing IPv4 Anycast Adoption and Deployment€¦ · Characterizing IPv4 Anycast Adoption and Deployment ... CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM

27] However these focused studies add yet other usefulpieces to the puzzle that remained so far incompletelacking the broad perspective given by a Internet-widecoverage over all prefixes and services

3 SYSTEM DESIGNAnycast detection relies on measuring round trip de-

lays between a set of vantage points and a target IP ad-dress to uncover geo-inconsistencies Running an Inter-net census thus requires measurements towards millionsof destinations ideally in a short timeframe we nowdescribe and justify system design choices that allow usto perform multiple censuses that we analyze later inSec 4 Items discussed in this section concern the se-lection of targets (Sec 31) the measurement platform(Sec 32) and software (Sec 33) as well as the networkprotocol used (Sec 34) Finally we report considera-tions about the scalability of our workflow (Sec 35)

31 Census targetsCensus granularity Unlike multicast anycast ad-dresses need no reservation into the IP space as any IPaddress can be a candidate this makes deployment easybut the detection of anycast addresses hard Luckily toavoid a significant increase in the size of routing tablesBGP standard practice [4] is to ignore or block prefixesshorter that 24 Thus 24 is the minimum granular-ity for anycasted services which is a good granularityfor our census We validate this assumption with (spot)verifications for all IP addresses on some IP24 (belong-ing to EdgeCast) confirming any IP in the 24 to beequivalent for anycast detection purposes Additionallya 24 granularity implies that announced BGP prefixesthat are smaller than 24 are tested multiple times oneper each 24 they contain the mapping between 24and announced prefixes is still possible a posteriori aswe do in this work This choice is reinforced by [35]which found 88 of announced prefixes to be 24 statedthat ldquoanycast prefixes are dominated by 24rdquo and sug-gested that larger prefixes may be anycast only in partdue to BGP prefix aggregation We therefore fix thecensus granularity to a single target IP per 24

Target liveness As previously argued any alive IPbelonging to a 24 is equivalent in telling whether thewhole 24 is anycast (or unicast) To identify a respon-sive IP address in every 24-prefix we rely on the hitlistperiodically published by [31] The hitlist consists ingenerally one representative IP address for O(107) pre-fixes along with a score indicative of the host livelinesscomputed over several measurement campaigns Whenno alive IP has been observed in a 24 the hitlist con-tains an arbitrary address from that 24 (score leminus2)After covering the full hitlist with the first census weconfirm these hosts not being reachable and remove

Figure 5 Microsoft deployment as seen fromPlanetLab (21 replicas) vs RIPE (54 replicas)Notice that PlanetLab results (white markers)are a subset of RIPE (white and black markers)

them to reduce the target size to 66 middot 106 per VP

Coverage Given our census aim we verify how wellthis hitlist covers all routed 24 prefixes We thereforeobtain from CAIDA a dump of routing tables originat-ing from both RIPE RIS and RouteViews collectorsTo compare the hitlist vs the advertised prefixes wesplit the latter in 24 obtaining 10616435 24 pre-fixes of which 10615563 have a representative in thehitlist (over 9999 coverage) We additionally cross-check our observed target responsiveness with the ex-pected recall specifically recent ICMP scans [48] ob-serve 49 middot106 used 24 subnets and our campaigns simi-larly capture 44 middot106 responsive subnets (90 coveragewith respect to [48])

32 Measurement dataset vs platformDataset One option to avoid running a large scalemeasurement campaign is to exploit readily availabledatasets from public measurement infrastructures ndash yetwe could not find any fitting our purpose For instancedespite probing all 24 every 2-3 days Archipelago [6]clusters its vantage points into three independent groupseach using random IPs selected in each 24 prefix itfollows that at most 3 monitors target each 24 withgenerally different IP addresses and a hit rate of about6 Given the low hit-rate and low-parallelism suchdataset is not appropriate for our purpose as it wouldnot lead to a complete census nor to an accurate geolo-cation footprint even in case of hits

Platforms There are a number of available measure-ment platforms in the community each with its ownadvantages and limitations Except for illustration pur-poses in this paper we relied on PlanetLab (PL) WhileRIPE Atlas (RIPE for short) is more interesting for ge-ographical diversity due to its scale it has a limitedcontrol on the rate and type (cf Sec 34) of measure-

ments as well as their instantiation for such a largescale campaign (ie upload of the hitlist probing bud-get) Additionally the larger number of vantage pointswould mechanically increase about 20-fold the amountof probes per census with respect to PL (in case allVPs are used) Conversely measurement in PL are lim-ited by node availability (generally around 300 vantagepoints) but offer full flexibility for deploying customsoftware and run it at high speed (cf Sec 33)

While in this work we limitedly use PL we depictfor illustration purposes an application of our techniquefrom measurements collected from PlanetLab vs RIPEin Fig 5 as PL results are a subset of RIPE resultswhite markers indicate replicas found from both plat-forms while black markers pinpoint replicas that areonly found with RIPE measurements While this ex-ample has anecdotal relevance it suggests that an in-triguing direction is to combine both platforms eg byrefining via RIPE the geolocation of anycast 24 de-tected via PL

33 Measurement softwareFastping An efficient measurement tool is needed tomaximize the probing capacity of our VPs While atfirst sight Zmap [22] could seem the perfect tool forsuch large-scale campaign it however exhibits a majorblocking point in our setup namely Zmap generatesraw Ethernet frames which are very efficient in a localsetup but are not supported by the PlanetLab virtu-alization layer We therefore resort to Fastping [26] atool specialized as the name implies in ICMP scan-ning which is deployed on each PL node Fastping isable to send about O(104) probes per second ndash abouttwo orders of magnitude slower than Zmap but fasterthan the fastest nmap scripting engine scanner As wewill point out later concerning scalability (Sec 35) inorder to gather complete censuses in few hours we hadto undergo several rounds of re-engineering ndash includingpurposely slowing down Fastping sending-rate

Greylist Additionally Fastping adopts the usual tech-niques to be a good Internet measurement citizen ndash iea signature in the payload points to its homepage Fast-ping probes the target list in a randomized order toreduce intrusiveness and implements a greylist mecha-nism to honor requests to stop probing administrativelyprohibited hostsnetworks inferred from ICMP returncodes Before running a census from O(102) VP we ini-tially run a census from a single VP in order to buildan initial blacklist During any census we then collectaddresses generating ICMP return codes (other thanecho reply) in a temporary greylist that we later in-crementally merge with the the blacklist This list hasapproximately O(105) hosts with 985 added due toadministrative filtering [8] (type 3 code 13) and the re-

0

20

40

60

80

100

ICMP TCP-53 TCP-80 DNSUDP DNSTCP

L3 L4 L7

Resp

on

se r

ati

o [

]

OpenDNS Edgecast Cloudflare Microsoft

Figure 6 Response rates seen by heterogeneousprotocols across different targets

0

025

05

075

1

GTPAI TPR Error[Km]0

250

500

750

1000

GT

PA

I an

d T

PR

Med

ian

err

or

[Km

]CloudFlare EdgeCast

Figure 7 Validation with CloudFlare and Edge-Cast ASes Bars represent standard deviationamong IP24 of the AS

maining in reason of communications administrativelyprohibited at network or host levels (respectively 13code 10 [14] and 02 code 9 [14])

34 Network protocolRecall ICMP has often been used (and misused) inmeasurement studies especially given recent work show-ing that ICMP latency measurements are often not re-liable [40] we thus need to confirm the validity of ourprotocol selection A major motivation for ICMP mea-surement is given by the high recall it offers [48] Con-sider indeed that TCP and UDP measurements wouldneed an a priori knowledge (or guess) of services run-ning on the target under test We therefore perform atest on a reduced set of targets performing 100 mea-surements with different protocols specifically we con-sider network L3 (ICMP) and transport L4 (TCP SYN-SYNACK pair in the three-way handshake to port53 or 80) measurements as well as L7 (DNSUDP vsDNSTCP using dig) measurements Fig 6 shows thatprotocols other than ICMP have a binary recall inother words they work well only if the service is knowna priori Conversely ICMP is the only reliable alterna-tive yielding high recall across all deployments and isthus well suited for censuses

Accuracy While our technique relies on latency mea-surements it leverages the discriminative power of sidechannel information (ie cities population within disks)to cope with latency measurement noise While we vali-date the accuracy of the methodology for DNS in [17] avalidation for stateful TCP connections is still missingTo do so we build a ground truth (GT) for CloudFlareand EdgeCast by performing HTTP measurements withcurl from PL note that HTTP measurements are notavailable from RIPE highlighting once more the com-plementarity of these platforms

By inspection of the HTTP headers we find thatCloudFlare (EdgeCast) encode geolocation of the replicain the custom CF-RAY (standard Server) header fieldNotice that the measured GT constitutes the upper-bound of what can be possibly achieved from PL mea-surements while the publicly available information (PAI)displayed on the CloudFlare and EdgeCast websites con-tains a super-set of locations with respect to those mea-sured from PL We contrast true positive (TPR) clas-sification of our census vs HTTP GT in Fig 7 in 77of the IP24 for CloudFlare (65 for EdgeCast) thereis agreement at city level with a median error of 434Km (287 Km for EdgeCast) in the (relatively few) mis-classification cases As expected the low number of PLnodes possibly limits the portion of discoverable repli-cas (GTPAI is fairly high for CloufFlare but fairly lowfor EdgeCast) making our footprint estimates conser-vative and confirming the interest for alternative plat-forms such as RIPE

Consistency Additionally in the case of openDNSwe verify consistency across multiple RTT latency mea-surement techniques used early in Fig 6 In this case werely on public information that maps 24 locations [39]For all protocols applying [17] on the dataset yieldsbetween 15 and 17 instances Notice that all cities re-turned by the analysis are correct except Philadelphia(while the server is located in Ashburn at 260km or26ms worth of propagation delay away) this misclas-sification is due to the bias enforced in [17] toward citypopulation (Philadelphia is 33 times more populatedthan Ashburn) but as observed in [15] this is not prob-lematic as the ldquophysicalrdquo Ashburn location is actuallyserving the ldquologicalrdquo Philadelphia population

35 ScalabilityProbing rate When designing census experimentswe take care of avoiding obvious pitfalls For instancewhile we target a single host per 24 nevertheless weperform measurements from all PL nodes It followsthat each node must desynchronize to avoid hitting ICMPrate limiting (or raising alert) at the destination We doso by randomized permutation for target nodes achieved

Table 1 Textual (0) vs binary (1-4) censusesCensus ID Format Size (hosttotal) Analysis0 csv (270M 79G) gt3 days1-4 binary (21M 6G) 3 hr

0

02

04

06

08

1

1 2 4 8 16

CD

F

Completion time per VP [Hr]

Figure 8 CDF of per-vantage point completiontime over all censuses

via a Linear Feedback Shift Register (LFSR) with Ga-lois configuration Still while the LFSR solves rate lim-iting at the target it does not solve problems at thesource (or in the network) indeed while requests arewell spread replies do aggregate close to the VP thatreceives an aggregate rate equal to the probing rate ofFastping (in excess of 10000 hosts per second) In ourpreliminary (and incomplete) censuses we noted het-erogeneous (and possibly very high) drop rates for someVPs (likely tied to rate limiting spatially close to theVP) Given that the networks where PL machines arehosted are independently administered we opted for asimple solution and slowed down Fastping by one orderof magnitude1 that we verified empirically not trigger-ing the above problems Consequently probing 66 middot106

targets at 103 targets per second takes less than twohours as shown in Fig8 about 40 of PL nodes com-plete within this timeframe and 95 in under 5 hours(longer duration likely due to load on the PL host)

Output size and analysis duration A second scal-ability issue concerns the output format We initiallyoverlooked this issue and logged in textual format awealth of information amounting to 270M per node and80GB overall per census (cf Tab1) We therefore optedfor a radical reduction of the output size dumping astripped-down binary format containing a timestampdelay and ICMP flag (encoding greylist return codes 910 or 13 as a negative sign) for a total of about 20MBper node and 6GB overall per census

A third challenge lies in the analysis of the data Fora single target the running time of [17] is O(10minus1) secwhich compares very favorably to the O(103) sec of thebrute force optimal solution at the same time pro-cessing a census would still take days (we indeed have

1While it is possible to more finely tune the probingrate per VP however coverage may benefit from samplescoming from the slowest VP especially if it resides in ageographical area which is not well covered by PL

stopped processing the complete Census-0 after 3 daysof CPU time where textual format additionally led toslow processing due to disk fragmentation) Moreoverdue to LFSR the order of the target IPs in all filesis not the same meaning that an on-the-fly sorting ofabout 300 lists (one per VP) containing millions targetsis needed We therefore optimized our implementationwhich currently runs in under three hours ie aboutthe same timescale of the census duration so that inprinciple we could perform a continuous analysis Whilethis is not interesting for the anycast characterizationuse-case it may become relevant for other applicationsof this technique (eg BGP hijacking inference men-tioned in Sec 5)

4 ANYCAST 0 CENSUSESThis section presents results of the first Internet-wide

anycast study We start by aggregated statistics (Sec 41)and then incrementally refine the picture by providinga birdrsquos-eye view of the most interesting deployments(Sec 42) over which we perform an additional portscancampaign to reveal their running services (Sec 43)

41 At a glanceDetails about the of our censuses are reported in Fig 10Overall 1696 IP24 belonging to 346 ASes appear tohave more than one anycast replica while we were ableto find only 897 IP24 belonging to 100 ASes havingat least 5 replicas with our technique The plot alsoshows a geographical density map of anycast replicasresults of our censuses are available for browsing at [21]offering per-deployment (as in Fig 5) or aggregated (asin Fig 10) visualizations Notice that results reportedin this paper correspond to censuses performed duringMarch 2015 with later censuses we observed small butinteresting changes in the anycast landscape While weplan to run a continuous service please be advised that(at time of writing) results at [21] refer to the censusesdescribed in this paper

Several remarks from Fig 10 are in order First no-tice that our results are conservative since (i) in regionswith low presence of PlanetLab VPs we may miss someanycast replicas eg when the BGP prefix is only lo-cally advertised (ii) the analysis technique provides alower bound on the number of replicas since overlap-ping disks may correspond to different anycast replicasbut they will not be considered in the solution of theMIS problem (recall Fig 3) Second we investigate theCAIDA AS rank list to cross check how many ASes us-ing IP-anycast figure in the top-100 results tabulatedin Fig 10 show that 19 IP24 of 8 ASes that play acentral role in the Internet belong to the list Similarlywe investigate the Alexa rank to cross check how manywebpages in the top-100k rank are hosted2 by ASes us-

2For the sake of simplicity we resolve the domain name

IP24 ASes Cities CC ReplicasAll 1696 346 77 38 13802ge 5 Replicas 897 100 71 36 11598

cap CAIDA-100 19 8 30 18 138cap Alexa-100k 242 15 45 29 4038

Figure 10 Anycast censuses results at a glance

ing IP-anycast here again we find 242 IP24 of 15ASes that are among the major players of the Web

42 Top-100 Anycast ASesAlbeit the amount of anycast IP24 may seem de-

ceiving at first in reason of its exiguous footprint it isnevertheless very rich ndash revealing silver needles in thehaystack From the very coarse cross-check of CAIDAand Alexa ranks we already expect that anycast usageis not only restricted to DNS but rather covers impor-tant ISPs and OTTs Fig 9 presents a birdrsquos-eye viewof anycast adoption depicting several information forthe 100 ASes for which we detected at least 5 replicasidentified by their WHOIS name reported in the x-axis(capped to 12 characters) Geographical and IP24 foot-print are reported in the bottom ASes are arranged leftto right in decreasing number of replicas (bottom bar-plot with standard deviation across IP24 belongingto the same AS) additionally reporting the number ofanycast IP24 for that AS (middle bar-plot) Servicefootprint is correlated to the open TCP ports in the AS(middle scatter-plot) Next the relative importance ofthe AS in the Internet and for the Web are expressedin terms of the CAIDA and Alexa ranks respectively(top scatter-plots) Finally a label reported on the topx-axis categorize the main activity of the ASes from abusiness perspective (category is informal and in case ofASes with multiple services only the most prominent isselected)

Big fishes Major players of the Internet ecosystemare easy to spot in Fig 9 The list includes not onlytier-1 and other ISPs (such as ATampT Services Tinet

of the frontpage found in Alexa to an IP and disregardcontent that is referenced in the frontpage

0

10

20

1 C

LO

UD

FL

AR

EN

ET

2 IS

C-A

SU

S3 H

UR

RIC

AN

EU

S4 C

DN

ET

WO

RK

SU

S-

5 F

AC

EB

OO

KU

S6 C

OM

MU

NIT

YD

NS

7 X

GT

LD

US

8 L

-RO

OT

US

9 M

ICR

OS

OF

TU

S10 I-R

OO

TS

E11 V

ER

ISIG

N-I

NC

12 L

LN

WU

S13 A

RY

AK

A-A

RIN

14 A

PP

LE

-EN

GIN

E15 C

ED

EX

ISU

S16 H

IGH

WIN

DS

3U

17 N

ET

NO

D-I

XS

E18 O

PE

ND

NS

US

19 W

OO

DY

NE

T-1

U20 L

GT

LD

US

21 L

IEC

HT

EN

ST

EI

22 F

AS

TL

YU

S23 C

AC

HE

NE

TW

OR

K24 IN

ST

AR

TU

S25 D

NS

CA

ST

-AS

U26 G

OO

GL

EU

S27 E

DG

EC

AS

T-I

R

28 U

MD

NE

TU

S29 D

YN

DN

SU

S30 N

SO

NE

US

31 E

AS

YL

INK

4U

S32 Y

AH

OO

-AN

2U

S33 U

LT

RA

DN

SU

S34 O

VH

FR

35 L

IEC

HT

EN

ST

EI

36 A

S-A

FIL

IAS

1

37 A

UT

OM

AT

TIC

U38 T

INE

T-B

AC

KB

O39 A

BO

VE

NE

T-C

US

40 A

MA

ZO

N-0

2U

S41 C

WG

B42 L

EV

EL

3U

S43 E

DG

EC

AS

TU

S44 T

WIT

TE

R-N

ET

W45 IN

CA

PS

UL

AU

S46 A

GT

LD

US

47 A

US

RE

GIS

TR

Y-

48 C

EN

TR

AL

NIC

-A49 C

OG

EN

T-2

149

50 H

GT

LD

US

51 H

IGH

WIN

DS

4U

52 K

-RO

OT

-SE

RV

E53 N

ET

RIP

LE

X01

54 O

MN

ITU

RE

US

55 S

OF

TL

AY

ER

US

56 W

AN

GS

U-U

SU

S57 Y

AH

OO

-FC

US

58 B

ITG

RA

VIT

YU

59 A

BIL

EN

EU

S60 A

DV

AN

-CA

ST

U61 A

SA

TTL

DSE

62 A

S-Q

UA

DR

AN

ET

63 A

S6453U

S64 A

TT

EU

65 C

EN

TR

AL

NIC

-A66 C

EN

TU

RY

LIN

K-

67 C

ON

EX

IM-A

S-A

68 E

GT

LD

US

69 K

GT

LD

US

70 M

NS

-AS

NO

71 N

ICA

TA

T72 V

ITA

L-D

NS

US

73 W

HS

-AN

YC

AS

T-

74 Z

GT

LD

US

75 IN

TE

RN

AP

-BL

K76 N

ET

AP

P-A

NY

CA

77 S

PR

INT

LIN

KU

78 A

US

RE

GIS

TR

Y-

79 C

EN

TU

RY

LIN

K-

80 D

NS

IMP

LE

US

81 D

YN

-HC

US

82 E

AS

YL

INK

2U

S83 E

DN

SC

A84 E

SG

OB

-AN

YC

AS

85 H

OM

EP

L-A

SP

L86 L

INK

ED

INU

S87 M

AS

ER

GY

US

88 M

ED

IAM

AT

H-I

N89 M

II-2

GB

90 M

II-X

PC

US

91 P

EE

R1U

S92 P

HH

-AS

DE

93 P

RE

TE

CS

CA

94 P

RO

LE

XIC

US

95 Q

UA

NT

CA

ST

US

96 R

IMB

LA

CK

BE

RR

97 S

UP

ER

NE

TW

OR

K98 U

NO

VA

-1C

A99 V

OX

ILIT

YR

O100 Z

VO

NK

OV

A-A

S

R

ep

lica

s

1

10

100

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

IP

s2

4

5380

443

8080

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00Op

en

po

rts

100

1k

10k

100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00A

lexa

ra

nk

110

1001k

10k100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

CD

ND

NS

ISP

CD

NS

ocia

l N

etw

ork

DN

SD

NS

DN

SC

lou

dD

NS

DN

SC

DN

Clo

ud

CD

NS

ecu

rity

CD

ND

NS

Secu

rity

DN

SD

NS

DN

Su

nkn

ow

nC

DN

CD

NC

DN

DN

SC

lou

dD

NS

CD

Nu

nkn

ow

nD

NS

DN

SC

lou

d m

essag

ing

Web

Po

rtal

DN

SC

lou

du

nkn

ow

nD

NS

Blo

gg

ing

ISP

-tie

r1IS

PC

lou

dIS

PIS

P-t

ier1

CD

NS

ocia

l N

etw

ork

CD

ND

NS

DN

SD

NS

ISP

DN

SC

DN

DN

SD

NS

On

lin

e M

ark

eti

ng

Clo

ud

CD

NW

eb

Po

rtal

CD

NB

ackb

on

e N

etw

ork

un

kn

ow

nD

NS

Clo

ud

ISP

-tie

r1IS

PD

NS

ISP

-tie

r1C

lou

dD

NS

DN

SV

ideo

Co

nfe

ren

cin

gD

NS

DN

SS

ecu

rity

DN

SC

lou

dW

eb

An

aly

tics

ISP

-tie

r1D

NS

ISP

DN

SD

NS

Clo

ud

messag

ing

DN

SD

NS

Clo

ud

So

cia

l N

etw

ork

Clo

ud

AD

tech

no

log

yC

DN

CD

NC

lou

dC

DN

CD

NS

ecu

rity

Web

An

aly

tics

Tele

co

m V

en

do

rC

lou

dD

NS

Clo

ud

un

kn

ow

n

Ca

ida

ra

nk

Figure 9 Birdrsquos eye view of Top-100 anycast ASes (ranked according to geographical footprint)

Sprint TATA Communications Qwest Level 3 Hur-ricane Electrics) but also a rather large spectrum ofOTTs such as CDNs (eg CloudFlare EdgeCast) host-ing (eg OVH) and cloud providers (eg MicrosoftAmazon Web Services) social networks (eg TwitterFacebook LinkedIn) and security companies that pro-vide mitigation services against DDoS attacks (eg Pro-lexic OpenDNS) The list also includes manufactur-ers (eg Apple RIM) Web registrars (eg Verisignnicat) virtual roaming and virtual meeting services(Media Network Services) blogging platforms (Automat-tic a publishing company hosting wordPresscom) cloudmessaging (EASYLINK2 owned by ATampT Services)and web analytics (OMNITURE owned by Adobe Sys-tems) Of course DNS-related service providers suchas root and top-level domain servers (eg ISCF-rootCommunityDNS) DNS service management (eg Ul-traDNS DynDNS) and public DNS resolvers (eg GoogleDNS OpenDNS) also emerge in the censusDiversity We report a breakdown of AS classes inFig 11 crisply showing that DNS now represents aboutone third of IP anycast activities Plots in Fig 9 clearly

illustrate the large diversity of anycast usage under allmetrics Indeed no correlation appear between any twometrics illustrating the degree of freedom in anycast de-ployments for instance the geographical footprint andIP24 footprints are largely unrelated (Pearson corre-lation coefficient of 035) Additionally the number ofopen ports and the specific port values vary not onlyacross deployments having an heterogeneous businessmodel (eg we observe from a minimum of 1 openport for DNS to O(104) open ports for OVH) but alsobetween deployments of the same kind (eg CloudFlareand EdgeCast CDNs have in common only port 53 80and 443 over the set of 22 open ports they are usingwith CloudFlare using 4times more ports than EdgeCast)

Geographical footprint We specifically study themean number of geographical replicas per AS (bottomplot in Fig 9) championed by the CDN CloudFlare inour measurement Overall we observe that 25 ASeshave at least 10 replicas distributed around the globeNotice that these orders of magnitude are significantlysmaller with respect to L7-anycast deployments that

0

5

10

15

20

25

30

35

DNSCDN

CloudISP

UnknownSecurity

SocialOther

AS

bre

akd

ow

n [

]

Figure 11 Breakdown of AS category (only firstcategory is considered)

can exceed O(103) for the large providers which is inpart due to the low number of vantage points in Plan-etLab (see Sec32) Among those ASes we observe10 DNS service providers (including ISC DNScast andDynDNS) and 7 major CDNs (eg CloudeFlare Lime-light Highwinds Fastly CacheNetworks Instart LogicCDNetworks) We also discover two cloud providers(eg Microsoft and Aryaka Networks) one tier-1 ISP(Hurricane Electric which has 15 of ASes in its cus-tomer cone according to CAIDA) a security company(OpenDNS also popular for its public DNS service) asocial network (Facebook) and a manufacturer (Apple)

Fig 12 further reports the cumulative number of repli-cas per IP24 depicting both results coming from thecombination of censuses as well as individual resultfrom each census alone Specifically the MIS solverorders circles by increasing radius size intuitively thesmaller the latency the lower the number of overlapsthe better the recall of our method This is confirmedin Fig 12 where censuses are combined by computingthe minimum among multiple latency measurementsbetween the same VP and target pair to get an es-timate of the RTT latency that is as close as possibleto the propagation delay Additionally combining mea-surement increases recall about 200 more IP24 arefound to be anycast in the combination with respect tothe average individual census

In this paper we limitedly consider results from thecombination but remark that results are quite con-sistent across censuses (notice that curves overlap inFig 12) A last comment is worth making about de-ployments where we observe only 2 geographically dis-tributed replicas ndash which is possibly due to the low den-sity of our VPs but could also be tied to the wronggeolocation of some VP raising false positive replicasWhile we have anecdotal evidence of some of these ex-iguous deployments being anycast we prefer to defer amore detailed analysis for future work (see Sec 5)

0 01 02 03 04 05 06 07 08 09

1

2 5 10 15 20 25

CD

F o

f n

um

ber

of

IPs2

4

Number of replicas

Combination of censuses-308VPs(130315) Census 4-240VPs(040315) Census 3-269VPs(020315) Census 2-255VPs(270215) Census 1-261VPs

Figure 12 CDF of geographically distinct repli-cas per IP24 (individual censuses and overall)

0 01 02 03 04 05 06 07 08 09

1

1 10 100 1000

CLOUDFLARENETUS (328)

GOOGLEUS (102)

EDGECASTUS (37)

PROLEXICUS (21)

LINKEDINUS (1)

APPLEUS (6)

TWITTERUS (3)

LEVEL3US (2)C

DF

of

nu

mb

er

of

AS

es

Number of IPs24

gt= 5 Replicas

Figure 13 Number of IPs24 per AS

IP24 footprint In terms of the number of anycastIPs24 per AS (middle plot in Fig 9) we find thatthe CDN CloudFlare is by far the largest in terms ofIP address ranges Overall we find 10 ASes that haveat least 10 anycast IPs24 3 are CDNs (CloudFlareEdgeCast BitGravity) 3 are DNS providers (DNScastWoodyNet UltraDNS) and the remaining ASes rep-resent multiple services (Automattic Google AmazonWeb Services and Prolexic) The distribution of thenumber of IPs24 per AS depicted in Fig 13 shows thatabout half have exactly one IP24 (eg LinkedIn andATampT Services) Yet about 10 of the ASes employat least 10 subnets for instance Prolexic EdgeCastGoogle and CloudFlare employ 21 37 102 and 328anycast IP24 respectively

While in this work we do not provide a systematicinvestigation of the deployment density (ie how manyIP32 are alive in each IP24) from the above discus-sion about diversity is not surprising that we were ableto identify both very sparse (eg Google 8888 is theonly address alive in the 888024) and very dense de-ployments (eg well over 99 of IPs are alive in mostCloudFlare subnets)

Importance The presence of ASes ranking amongthe top-100 in the CAIDA list as well as CDNs servingcontent in the top-100k Alexa list are good indicatorsthat anycast is used for popular and important servicesConsidering CDNs that are after DNS the most popu-lar anycast service according to Fig 11 we observe that8 CDNs serve Alexa-100k websites this set includesCloudFlare EdgeCast and Fastly with 188 10 and5 websites respectively (in addition Highwinds Cach-eNetworks Instart Incapsula and BitGravity host onepopular site each) In addition 11 of the websites listedby Alexa are hosted by Google anycast IPs Finally10 websites are hosted on IPs that belong to Prolexic(now part of Akamai) which operates a DDoS mitiga-tion service that receives the traffic on behalf of its clientnetworks redirecting only legitimate traffic to them

43 Anycast ServicesPortscan campaign In reason of the historical use ofanycast for DNS services we believe it to be importantto provide an up-to-date longitudinal view across ser-vices offered via IP-anycast especially focusing on TCPWe provide a summary of the nmap probing in the topof Fig 14 We test all anycast 24 of the top-100 ASespicking a single IP representative per 24 we scan atlow rates all 216 TCP ports Our results are conser-vative in that different IPs in the same 24 may havedifferent open ports (which happens eg for Cloud-Flare and EdgeCast) and since an under-estimation ofthe number of open TCP ports can also be the result ofprobe filtering by firewalls and routers along the pathto the targets Out of the 897 IP of the top-100 ASeswe find that 816 of 81 ASes have at least one open TCPport The total number of distinct open TCP portsacross is 10485 providing 449 well-known services (ieas indicated by TCP port classification) 170 of whichover SSL Additionally nmap fingerprinting discovers30 different software implementations running on theanycast replicas that we also detail next

Class imbalance Given the heterogeneity of the IP24footprint we argue being necessary to consider only per-AS statistics to avoid presenting results that are biaseddue to class imbalance We illustrate the problem bydepicting in Fig 14 the frequency count of the top-10open TCP ports by number of ASes (top) and IPs24(bottom) Notice that only port 80 443 and 53 appearto be common to both top and bottom plots especiallyall ports in the hatched area are due to the large pre-dominance of IP24 owned by the CloudFlare AS whichalso affect the order of common ports in the top-10 Wethus focus on per AS statistics in the following

Stateful services Fig 15 presents the complemen-tary CDF of the number of open TCP ports per AS

nmap portscan statisticsIPs32 ASes Ports (SSL) Well Software

known812 81 10499 (185) 457 30

0

200

400

80

443

8080

53

2052

2053

2082

2083

8443

2087IP

24 f

req

uen

cy

IP2

4 f

req

uen

cy

0

200

400

htt

p

htt

p s

sl

htt

p

do

main

htt

p

htt

p

htt

p

htt

p

htt

p

htt

p

0

20

40

60

53

80

443

179

22

8080

8083

3306

1935

5252A

S f

req

uen

cy

AS

fre

qu

en

cy

0

20

40

60

do

main

htt

p

htt

p-s

sl

tcp

wra

pp

ed

ssh

htt

p

tcp

wra

pp

ed

tcp

wra

pp

ed

sip

mo

vaz-s

sc

Figure 14 Overall nmap portscan statistics andTop-10 open TCP ports (per AS and per 24)

We make the following observations (i) roughly half ofthe ASes have at least one open TCP port (ii) about10 of the ASes have at least 5 open TCP ports and(iii) the largest service footprint is represented by Incap-sula and especially OVH with 313 and 10148 open portsrespectively In the latter case while we did not inves-tigate thoroughly we suspect the large number of portsbeing due to the fact that OVH the largest hosting ser-vice in Europe and the 3rd in the world is significantlypopular in the BitTorrent seedbox ecosystem [42] Pre-dominant services (beyond DNS) include fairly popu-lar HTTP and HTTPS used by over 20 of the ASesEven excluding the OVH case the list of interestingservices is large In terms of business diversity 22 ASeshave at least 4 different TCP ports open 8 CDNs 4DNS 4 ISPs including a tier-1 ISP (Tinet SpA) andGoogle with 9 open TCP ports Finally interesting(though unpopular) services worth listing include multi-media services (RTMP Simplify Media MythTV) andgaming (Minecraft)

Software diversity Fig 16 lists 30 different soft-ware that we group into three main categories WebMail and DNS Interestingly the list includes opensource software such as popular web and DNS daemons(eg nginx ISC BIND) and proprietary software (egECAccECSECD which are web servers developed byEdgeCast) Starting with DNS software notice that for44 ASes using port 53 (out of 67) nmap could not iden-tify the software version running on the remote serverUnsurprisingly we find that ISC BIND is by far themost adopted protocol to handle DNS requests overanycast Yet we also detect the use by 3 ASes (AppleK-ROOT L-ROOT) of the NLnet Labs NSD implemen-

001

01

1

1 10 100 1000 10000 100000

OVHFR (10148)

INCAPSULAUS (330)

CLOUDFLARENETUS (20)

GOOGLEUS (9)

EDGECASTUS (5)C

CD

F

Number of open TCP ports

ASes

Figure 15 Complementary CDF of the numberof open TCP ports per AS

tation which is specifically designed to add resilienceagainst software failures of DNS root servers

Among web servers the most popular are nginx (7ASes) Apache httpd and lighttpd (ex aeligquo with 4ASes) We observe the use of proprietary web serversby some CDNs (eg cloudflare-nginx and Panel httpd)Though our dataset has a limited size we attempt acomparison with the relative popularity of webserversin the unicast world the Spearman correlation of popu-larity rank in our dataset with webserver ranks [2] in theAlexa-10M is low (038) As for the DNS case differencemay arise in some peculiar features that are especiallyvaluable in the anycast context Finally we detect thepresence of running daemons that serve mail on anycastIPs from Google (Gmail imapd Gmail pop3d gsmtp)as well as of RPC (ssh MicrosoftRPC) and databases(MySQLMicrosoft SQL)

5 DISCUSSIONWe present the first census(es) of IPv4 anycast de-

ployment gathered through an original and robust tech-nique implemented with an efficient and scalable systemdesign In spirit with the open source and data move-ment results of our census are available at [21]

Our characterization show that a tiny fraction of theIPv4 space is anycasted yet among the anycasters werecognize major players of the Internet ecosystem in-cluding top-ranking ISPs popular Cloud OTT and es-pecially CDN operators We additionally show greatheterogeneity along multiple directions and especiallyin terms of the offered services Particularly our portscancampaign of anycasted subnets reveals over 450 well-known services from over 10 000 unique open portsAdditionally we uncover 30 software implementationswith a relative breakdown that differs from softwareranking in the unicast IP world

Yet this work only scratches the surface and opensmore questions than it is able to answer as for instance

0

10

20

ISC

BIN

DN

Ln

et

Lab

s N

SD

Mic

roso

ft D

NS

Op

en

DN

S

ng

inx

lig

htt

pd

Ap

ach

e h

ttp

dE

CD

Mic

roso

ft H

TT

P

Mic

roso

ft IIS

Varn

ish

Ap

ach

e T

om

cat

bit

asic

v2

CF

S 0

213

clo

ud

flare

-ng

inx

cP

an

el h

ttp

dth

ttp

dE

CA

ccE

CS

Go

og

le h

ttp

din

sta

rt1

60

Gm

ail im

ap

dG

mail p

op

3d

Go

og

le g

sm

tp

Op

en

SS

HM

yS

QL

ssls

trip

Mic

roso

ft S

QL

M

icro

so

ft R

PC

DNS Web Mail Other

AS

fre

qu

en

cy

Figure 16 Breakdown of software running onanycast replicas

IP-level CDN Refining the active methodology bymapping content object (and not only frontpage) fromthe Alexa-100k would be needed to gather a better un-derstanding of IP-level CDNLongitudinal view Taking periodic censuses and an-alyzing the time evolution over longer timescales wouldallow to track evolution of IP anycast deploymentsTraffic volume A missing information concerns thetraffic volume served by IP anycast that can be gath-ered via passive measurement and annotated with re-sults of our census (ie binary flag per anycast IP24)Combine measurement platforms As we have seenit would be interesting to exploit multiple platforms inaddition to PlanetLab such as RIPE Atlas this wouldboth lead to a better characterization of large deploy-ments (eg increase the recall) as well as possibly assistin confirmingdiscarding suspicious deployments (iethose for which we detected 2 replicas from PL)BGP hijacking Detecting geo-inconsistencies for know-ingly unicast prefixes is symptomatic of BGP hijack-ing attacks being able to periodically and quickly scanthe network to raise alarms and cross-check them withother types of data (eg BGP feeds traceroute mea-surements) is a relevant extension of this work

AcknowledgementsWe thank our shepherd Matteo Varvello and the anony-mous CoNEXT reviewers for their valuable feedbackThis work has been carried out at LINCS (httpwwwlincsfr) The research leading to these results was sup-ported by the European Union under the FP7 GrantAgreement n 318627 (Integrated Project rdquomPlanerdquo)and by a Google Faculty Research Award

6 REFERENCES[1] httpinternetcensus2012bitbucketorgpaperhtml

[2] httpw3techscomtechnologiesoverviewweb serverall

[3] J Abley A software approach to distributing requestsfor DNS service using GNU Zebra ISC BIND 9FreeBSD In Proc USENIX ATEC 2004

[4] J Abley and K Lindqvist Operation of AnycastServices RFC 4786 (Best Current Practice) 2006

[5] V K Adhikari S Jain and Z li Zhang Youtubetraffic dynamics and its interplay with a tier-1 isp Anisp perspective In ACM IMC 2010

[6] httpwwwcaidaorgprojectsark

[7] httpwwwroot-serversorg

[8] F Baker Requirements for IP Version 4 RoutersIETF RFC 1812 1995

[9] H Ballani and P Francis Towards a global IP anycastservice In Proc ACM SIGCOMM 2005

[10] H Ballani P Francis and S Ratnasamy Ameasurement-based deployment proposal for IPanycast In ACM IMC 2006

[11] B Barber M Larson and M Kosters Traffic sourceanalysis of the J root anycast instances Nanog 2006

[12] I Bermudez S Traverso M Mellia and M MunafoExploring the cloud from passive measurements TheAmazon AWS case In Proc IEEE INFOCOM 2013

[13] P Boothe and R Bush DNS Anycast Stability SomeEarly Results CAIDA 2005

[14] R Braden Requirements for Internet Hosts -Communication Layers IETF RFC 1122 1989

[15] M Calder X Fan Z Hu E Katz-BassettJ Heidemann and R Govindan Mapping theexpansion of googlersquos serving infrastructure In ACMIMC 2013

[16] M Calder A Flavel E Katz-Bassett R Mahajanand J Padhye Analyzing the performance of ananycast cdn In ACM IMC 2015

[17] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman A fistful of pings Accurateand lightweight anycast enumeration and geolocationIn Proc IEEE INFOCOM 2015

[18] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman Latency-based anycastgeolocalization Algorithms software and datasets InTech Rep 2015

[19] L Colitti Measuring anycast server performance Thecase of K-root Nanog 2006

[20] C Contavalli W van der Gaast S Leach andE Lewis Client Subnet in DNS Queries httpstools

ietforghtmldraft-ietf-dnsop-edns-client-subnet-04

[21] httpwwwenstfr˜drossianycast

[22] Z Durumeric E Wustrow and J A HaldermanZmap Fast internet-wide scanning and its securityapplications In USENIX Security Symposium 2013

[23] B Eriksson P Barford J Sommers and R NowakA learning-based approach for IP geolocation InPAM 2010

[24] B Eriksson and M Crovella Understandinggeolocation accuracy using network geometry In ProcIEEE INFOCOM 2013

[25] X Fan J S Heidemann and R GovindanEvaluating anycast in the domain name system InProc IEEE INFOCOM 2013

[26] httpwwwict-mplaneeupublicfastping

[27] A Flavel P Mani D A Maltz N Holt J LiuY Chen and O Surmachev Fastroute A scalableload-aware anycast routing architecture for moderncdns In Proc USENIX NSDI 2015

[28] B Gueye A Ziviani M Crovella and S FdidaConstraint-based geolocation of internet hosts InACM IMC 2004

[29] T Hardie Distributing Authoritative Name Serversvia Shared Unicast Addresses IETF RFC 3258 2002

[30] J Heidemann Y Pradkin R GovindanC Papadopoulos G Bartlett and J BannisterCensus and survey of the visible internet In ACMIMC 2008

[31] Internet addresses hitlist dataset (20140829rev4338)Provided by the USCLANDER project(httpwwwisieduantlander)

[32] D Karrenberg Anycast and BGP stability A closerlook at DNSMON data Nanog 2005

[33] T Krenc O Hohlfeld and A Feldmann An internetcensus taken by an illegal botnet A qualitativeassessment of published measurements ACM CCR44(3) 2014

[34] Z Liu B Huffaker M Fomenkov N Brownlee andK C Claffy Two days in the life of the DNS anycastroot servers In PAM 2007

[35] D Madory C Cook and K Miao Who are theanycasters Nanog 2013

[36] B M Maggs and R K Sitaraman Algorithmicnuggets in content delivery SIGCOMM ComputCommun Rev 45(3) Jul 2015

[37] K Miller Deploying IP anycast Nanog 2003

[38] httpsnmaporg

[39] httpswwwopendnscomdata-center-locations

[40] C Pelsser L Cittadini S Vissicchio and R BushFrom paris to tokyo On the suitability of ping tomeasure latency In ACM IMC 2013

[41] I Poese S Uhlig M A Kaafar B Donnet andB Gueye IP geolocation databases Unreliable ACMCCR 41(2) 2011

[42] D Rossi G Pujol X Wang and F Mathieu Peekingthrough the BitTorrent seedbox hosting ecosystem InProc Traffic Monitoring and Analysis (TMA) 2014

[43] S Sarat V Pappas and A Terzis On the use ofanycast in DNS In Proc ACM SIGMETRICS 2005

[44] Y Shavitt and N Zilberman A geolocation databasesstudy IEEE J-SAC 29(10) 2011

[45] F Streibelt J Bottger N Chatzis G Smaragdakisand A Feldmann Exploring edns-client-subnetadopters in your free time In ACM IMC 2013

[46] A Su D R Choffnes A Kuzmanovic andF Bustamante Drafting behind akamai(travelocity-based detouring) In Proc ACMSIGCOMM 2006

[47] R Torres A Finamore J R Kim M Mellia M MMunafo and S Rao Dissecting video server selectionstrategies in the YouTube CDN In Proc of IEEEICDCS 2011

[48] S Zander L L Andrew and G Armitage Capturingghosts Predicting the used ipv4 space by inferringunobserved addresses In ACM IMC 2014

  • Introduction
  • Methodology Overview
    • Workflow
    • State of the art
      • System design
        • Census targets
        • Measurement dataset vs platform
        • Measurement software
        • Network protocol
        • Scalability
          • Anycast 0 censuses
            • At a glance
            • Top-100 Anycast ASes
            • Anycast Services
              • Discussion
              • References
Page 6: Characterizing IPv4 Anycast Adoption and Deployment€¦ · Characterizing IPv4 Anycast Adoption and Deployment ... CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM

ments as well as their instantiation for such a largescale campaign (ie upload of the hitlist probing bud-get) Additionally the larger number of vantage pointswould mechanically increase about 20-fold the amountof probes per census with respect to PL (in case allVPs are used) Conversely measurement in PL are lim-ited by node availability (generally around 300 vantagepoints) but offer full flexibility for deploying customsoftware and run it at high speed (cf Sec 33)

While in this work we limitedly use PL we depictfor illustration purposes an application of our techniquefrom measurements collected from PlanetLab vs RIPEin Fig 5 as PL results are a subset of RIPE resultswhite markers indicate replicas found from both plat-forms while black markers pinpoint replicas that areonly found with RIPE measurements While this ex-ample has anecdotal relevance it suggests that an in-triguing direction is to combine both platforms eg byrefining via RIPE the geolocation of anycast 24 de-tected via PL

33 Measurement softwareFastping An efficient measurement tool is needed tomaximize the probing capacity of our VPs While atfirst sight Zmap [22] could seem the perfect tool forsuch large-scale campaign it however exhibits a majorblocking point in our setup namely Zmap generatesraw Ethernet frames which are very efficient in a localsetup but are not supported by the PlanetLab virtu-alization layer We therefore resort to Fastping [26] atool specialized as the name implies in ICMP scan-ning which is deployed on each PL node Fastping isable to send about O(104) probes per second ndash abouttwo orders of magnitude slower than Zmap but fasterthan the fastest nmap scripting engine scanner As wewill point out later concerning scalability (Sec 35) inorder to gather complete censuses in few hours we hadto undergo several rounds of re-engineering ndash includingpurposely slowing down Fastping sending-rate

Greylist Additionally Fastping adopts the usual tech-niques to be a good Internet measurement citizen ndash iea signature in the payload points to its homepage Fast-ping probes the target list in a randomized order toreduce intrusiveness and implements a greylist mecha-nism to honor requests to stop probing administrativelyprohibited hostsnetworks inferred from ICMP returncodes Before running a census from O(102) VP we ini-tially run a census from a single VP in order to buildan initial blacklist During any census we then collectaddresses generating ICMP return codes (other thanecho reply) in a temporary greylist that we later in-crementally merge with the the blacklist This list hasapproximately O(105) hosts with 985 added due toadministrative filtering [8] (type 3 code 13) and the re-

0

20

40

60

80

100

ICMP TCP-53 TCP-80 DNSUDP DNSTCP

L3 L4 L7

Resp

on

se r

ati

o [

]

OpenDNS Edgecast Cloudflare Microsoft

Figure 6 Response rates seen by heterogeneousprotocols across different targets

0

025

05

075

1

GTPAI TPR Error[Km]0

250

500

750

1000

GT

PA

I an

d T

PR

Med

ian

err

or

[Km

]CloudFlare EdgeCast

Figure 7 Validation with CloudFlare and Edge-Cast ASes Bars represent standard deviationamong IP24 of the AS

maining in reason of communications administrativelyprohibited at network or host levels (respectively 13code 10 [14] and 02 code 9 [14])

34 Network protocolRecall ICMP has often been used (and misused) inmeasurement studies especially given recent work show-ing that ICMP latency measurements are often not re-liable [40] we thus need to confirm the validity of ourprotocol selection A major motivation for ICMP mea-surement is given by the high recall it offers [48] Con-sider indeed that TCP and UDP measurements wouldneed an a priori knowledge (or guess) of services run-ning on the target under test We therefore perform atest on a reduced set of targets performing 100 mea-surements with different protocols specifically we con-sider network L3 (ICMP) and transport L4 (TCP SYN-SYNACK pair in the three-way handshake to port53 or 80) measurements as well as L7 (DNSUDP vsDNSTCP using dig) measurements Fig 6 shows thatprotocols other than ICMP have a binary recall inother words they work well only if the service is knowna priori Conversely ICMP is the only reliable alterna-tive yielding high recall across all deployments and isthus well suited for censuses

Accuracy While our technique relies on latency mea-surements it leverages the discriminative power of sidechannel information (ie cities population within disks)to cope with latency measurement noise While we vali-date the accuracy of the methodology for DNS in [17] avalidation for stateful TCP connections is still missingTo do so we build a ground truth (GT) for CloudFlareand EdgeCast by performing HTTP measurements withcurl from PL note that HTTP measurements are notavailable from RIPE highlighting once more the com-plementarity of these platforms

By inspection of the HTTP headers we find thatCloudFlare (EdgeCast) encode geolocation of the replicain the custom CF-RAY (standard Server) header fieldNotice that the measured GT constitutes the upper-bound of what can be possibly achieved from PL mea-surements while the publicly available information (PAI)displayed on the CloudFlare and EdgeCast websites con-tains a super-set of locations with respect to those mea-sured from PL We contrast true positive (TPR) clas-sification of our census vs HTTP GT in Fig 7 in 77of the IP24 for CloudFlare (65 for EdgeCast) thereis agreement at city level with a median error of 434Km (287 Km for EdgeCast) in the (relatively few) mis-classification cases As expected the low number of PLnodes possibly limits the portion of discoverable repli-cas (GTPAI is fairly high for CloufFlare but fairly lowfor EdgeCast) making our footprint estimates conser-vative and confirming the interest for alternative plat-forms such as RIPE

Consistency Additionally in the case of openDNSwe verify consistency across multiple RTT latency mea-surement techniques used early in Fig 6 In this case werely on public information that maps 24 locations [39]For all protocols applying [17] on the dataset yieldsbetween 15 and 17 instances Notice that all cities re-turned by the analysis are correct except Philadelphia(while the server is located in Ashburn at 260km or26ms worth of propagation delay away) this misclas-sification is due to the bias enforced in [17] toward citypopulation (Philadelphia is 33 times more populatedthan Ashburn) but as observed in [15] this is not prob-lematic as the ldquophysicalrdquo Ashburn location is actuallyserving the ldquologicalrdquo Philadelphia population

35 ScalabilityProbing rate When designing census experimentswe take care of avoiding obvious pitfalls For instancewhile we target a single host per 24 nevertheless weperform measurements from all PL nodes It followsthat each node must desynchronize to avoid hitting ICMPrate limiting (or raising alert) at the destination We doso by randomized permutation for target nodes achieved

Table 1 Textual (0) vs binary (1-4) censusesCensus ID Format Size (hosttotal) Analysis0 csv (270M 79G) gt3 days1-4 binary (21M 6G) 3 hr

0

02

04

06

08

1

1 2 4 8 16

CD

F

Completion time per VP [Hr]

Figure 8 CDF of per-vantage point completiontime over all censuses

via a Linear Feedback Shift Register (LFSR) with Ga-lois configuration Still while the LFSR solves rate lim-iting at the target it does not solve problems at thesource (or in the network) indeed while requests arewell spread replies do aggregate close to the VP thatreceives an aggregate rate equal to the probing rate ofFastping (in excess of 10000 hosts per second) In ourpreliminary (and incomplete) censuses we noted het-erogeneous (and possibly very high) drop rates for someVPs (likely tied to rate limiting spatially close to theVP) Given that the networks where PL machines arehosted are independently administered we opted for asimple solution and slowed down Fastping by one orderof magnitude1 that we verified empirically not trigger-ing the above problems Consequently probing 66 middot106

targets at 103 targets per second takes less than twohours as shown in Fig8 about 40 of PL nodes com-plete within this timeframe and 95 in under 5 hours(longer duration likely due to load on the PL host)

Output size and analysis duration A second scal-ability issue concerns the output format We initiallyoverlooked this issue and logged in textual format awealth of information amounting to 270M per node and80GB overall per census (cf Tab1) We therefore optedfor a radical reduction of the output size dumping astripped-down binary format containing a timestampdelay and ICMP flag (encoding greylist return codes 910 or 13 as a negative sign) for a total of about 20MBper node and 6GB overall per census

A third challenge lies in the analysis of the data Fora single target the running time of [17] is O(10minus1) secwhich compares very favorably to the O(103) sec of thebrute force optimal solution at the same time pro-cessing a census would still take days (we indeed have

1While it is possible to more finely tune the probingrate per VP however coverage may benefit from samplescoming from the slowest VP especially if it resides in ageographical area which is not well covered by PL

stopped processing the complete Census-0 after 3 daysof CPU time where textual format additionally led toslow processing due to disk fragmentation) Moreoverdue to LFSR the order of the target IPs in all filesis not the same meaning that an on-the-fly sorting ofabout 300 lists (one per VP) containing millions targetsis needed We therefore optimized our implementationwhich currently runs in under three hours ie aboutthe same timescale of the census duration so that inprinciple we could perform a continuous analysis Whilethis is not interesting for the anycast characterizationuse-case it may become relevant for other applicationsof this technique (eg BGP hijacking inference men-tioned in Sec 5)

4 ANYCAST 0 CENSUSESThis section presents results of the first Internet-wide

anycast study We start by aggregated statistics (Sec 41)and then incrementally refine the picture by providinga birdrsquos-eye view of the most interesting deployments(Sec 42) over which we perform an additional portscancampaign to reveal their running services (Sec 43)

41 At a glanceDetails about the of our censuses are reported in Fig 10Overall 1696 IP24 belonging to 346 ASes appear tohave more than one anycast replica while we were ableto find only 897 IP24 belonging to 100 ASes havingat least 5 replicas with our technique The plot alsoshows a geographical density map of anycast replicasresults of our censuses are available for browsing at [21]offering per-deployment (as in Fig 5) or aggregated (asin Fig 10) visualizations Notice that results reportedin this paper correspond to censuses performed duringMarch 2015 with later censuses we observed small butinteresting changes in the anycast landscape While weplan to run a continuous service please be advised that(at time of writing) results at [21] refer to the censusesdescribed in this paper

Several remarks from Fig 10 are in order First no-tice that our results are conservative since (i) in regionswith low presence of PlanetLab VPs we may miss someanycast replicas eg when the BGP prefix is only lo-cally advertised (ii) the analysis technique provides alower bound on the number of replicas since overlap-ping disks may correspond to different anycast replicasbut they will not be considered in the solution of theMIS problem (recall Fig 3) Second we investigate theCAIDA AS rank list to cross check how many ASes us-ing IP-anycast figure in the top-100 results tabulatedin Fig 10 show that 19 IP24 of 8 ASes that play acentral role in the Internet belong to the list Similarlywe investigate the Alexa rank to cross check how manywebpages in the top-100k rank are hosted2 by ASes us-

2For the sake of simplicity we resolve the domain name

IP24 ASes Cities CC ReplicasAll 1696 346 77 38 13802ge 5 Replicas 897 100 71 36 11598

cap CAIDA-100 19 8 30 18 138cap Alexa-100k 242 15 45 29 4038

Figure 10 Anycast censuses results at a glance

ing IP-anycast here again we find 242 IP24 of 15ASes that are among the major players of the Web

42 Top-100 Anycast ASesAlbeit the amount of anycast IP24 may seem de-

ceiving at first in reason of its exiguous footprint it isnevertheless very rich ndash revealing silver needles in thehaystack From the very coarse cross-check of CAIDAand Alexa ranks we already expect that anycast usageis not only restricted to DNS but rather covers impor-tant ISPs and OTTs Fig 9 presents a birdrsquos-eye viewof anycast adoption depicting several information forthe 100 ASes for which we detected at least 5 replicasidentified by their WHOIS name reported in the x-axis(capped to 12 characters) Geographical and IP24 foot-print are reported in the bottom ASes are arranged leftto right in decreasing number of replicas (bottom bar-plot with standard deviation across IP24 belongingto the same AS) additionally reporting the number ofanycast IP24 for that AS (middle bar-plot) Servicefootprint is correlated to the open TCP ports in the AS(middle scatter-plot) Next the relative importance ofthe AS in the Internet and for the Web are expressedin terms of the CAIDA and Alexa ranks respectively(top scatter-plots) Finally a label reported on the topx-axis categorize the main activity of the ASes from abusiness perspective (category is informal and in case ofASes with multiple services only the most prominent isselected)

Big fishes Major players of the Internet ecosystemare easy to spot in Fig 9 The list includes not onlytier-1 and other ISPs (such as ATampT Services Tinet

of the frontpage found in Alexa to an IP and disregardcontent that is referenced in the frontpage

0

10

20

1 C

LO

UD

FL

AR

EN

ET

2 IS

C-A

SU

S3 H

UR

RIC

AN

EU

S4 C

DN

ET

WO

RK

SU

S-

5 F

AC

EB

OO

KU

S6 C

OM

MU

NIT

YD

NS

7 X

GT

LD

US

8 L

-RO

OT

US

9 M

ICR

OS

OF

TU

S10 I-R

OO

TS

E11 V

ER

ISIG

N-I

NC

12 L

LN

WU

S13 A

RY

AK

A-A

RIN

14 A

PP

LE

-EN

GIN

E15 C

ED

EX

ISU

S16 H

IGH

WIN

DS

3U

17 N

ET

NO

D-I

XS

E18 O

PE

ND

NS

US

19 W

OO

DY

NE

T-1

U20 L

GT

LD

US

21 L

IEC

HT

EN

ST

EI

22 F

AS

TL

YU

S23 C

AC

HE

NE

TW

OR

K24 IN

ST

AR

TU

S25 D

NS

CA

ST

-AS

U26 G

OO

GL

EU

S27 E

DG

EC

AS

T-I

R

28 U

MD

NE

TU

S29 D

YN

DN

SU

S30 N

SO

NE

US

31 E

AS

YL

INK

4U

S32 Y

AH

OO

-AN

2U

S33 U

LT

RA

DN

SU

S34 O

VH

FR

35 L

IEC

HT

EN

ST

EI

36 A

S-A

FIL

IAS

1

37 A

UT

OM

AT

TIC

U38 T

INE

T-B

AC

KB

O39 A

BO

VE

NE

T-C

US

40 A

MA

ZO

N-0

2U

S41 C

WG

B42 L

EV

EL

3U

S43 E

DG

EC

AS

TU

S44 T

WIT

TE

R-N

ET

W45 IN

CA

PS

UL

AU

S46 A

GT

LD

US

47 A

US

RE

GIS

TR

Y-

48 C

EN

TR

AL

NIC

-A49 C

OG

EN

T-2

149

50 H

GT

LD

US

51 H

IGH

WIN

DS

4U

52 K

-RO

OT

-SE

RV

E53 N

ET

RIP

LE

X01

54 O

MN

ITU

RE

US

55 S

OF

TL

AY

ER

US

56 W

AN

GS

U-U

SU

S57 Y

AH

OO

-FC

US

58 B

ITG

RA

VIT

YU

59 A

BIL

EN

EU

S60 A

DV

AN

-CA

ST

U61 A

SA

TTL

DSE

62 A

S-Q

UA

DR

AN

ET

63 A

S6453U

S64 A

TT

EU

65 C

EN

TR

AL

NIC

-A66 C

EN

TU

RY

LIN

K-

67 C

ON

EX

IM-A

S-A

68 E

GT

LD

US

69 K

GT

LD

US

70 M

NS

-AS

NO

71 N

ICA

TA

T72 V

ITA

L-D

NS

US

73 W

HS

-AN

YC

AS

T-

74 Z

GT

LD

US

75 IN

TE

RN

AP

-BL

K76 N

ET

AP

P-A

NY

CA

77 S

PR

INT

LIN

KU

78 A

US

RE

GIS

TR

Y-

79 C

EN

TU

RY

LIN

K-

80 D

NS

IMP

LE

US

81 D

YN

-HC

US

82 E

AS

YL

INK

2U

S83 E

DN

SC

A84 E

SG

OB

-AN

YC

AS

85 H

OM

EP

L-A

SP

L86 L

INK

ED

INU

S87 M

AS

ER

GY

US

88 M

ED

IAM

AT

H-I

N89 M

II-2

GB

90 M

II-X

PC

US

91 P

EE

R1U

S92 P

HH

-AS

DE

93 P

RE

TE

CS

CA

94 P

RO

LE

XIC

US

95 Q

UA

NT

CA

ST

US

96 R

IMB

LA

CK

BE

RR

97 S

UP

ER

NE

TW

OR

K98 U

NO

VA

-1C

A99 V

OX

ILIT

YR

O100 Z

VO

NK

OV

A-A

S

R

ep

lica

s

1

10

100

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

IP

s2

4

5380

443

8080

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00Op

en

po

rts

100

1k

10k

100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00A

lexa

ra

nk

110

1001k

10k100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

CD

ND

NS

ISP

CD

NS

ocia

l N

etw

ork

DN

SD

NS

DN

SC

lou

dD

NS

DN

SC

DN

Clo

ud

CD

NS

ecu

rity

CD

ND

NS

Secu

rity

DN

SD

NS

DN

Su

nkn

ow

nC

DN

CD

NC

DN

DN

SC

lou

dD

NS

CD

Nu

nkn

ow

nD

NS

DN

SC

lou

d m

essag

ing

Web

Po

rtal

DN

SC

lou

du

nkn

ow

nD

NS

Blo

gg

ing

ISP

-tie

r1IS

PC

lou

dIS

PIS

P-t

ier1

CD

NS

ocia

l N

etw

ork

CD

ND

NS

DN

SD

NS

ISP

DN

SC

DN

DN

SD

NS

On

lin

e M

ark

eti

ng

Clo

ud

CD

NW

eb

Po

rtal

CD

NB

ackb

on

e N

etw

ork

un

kn

ow

nD

NS

Clo

ud

ISP

-tie

r1IS

PD

NS

ISP

-tie

r1C

lou

dD

NS

DN

SV

ideo

Co

nfe

ren

cin

gD

NS

DN

SS

ecu

rity

DN

SC

lou

dW

eb

An

aly

tics

ISP

-tie

r1D

NS

ISP

DN

SD

NS

Clo

ud

messag

ing

DN

SD

NS

Clo

ud

So

cia

l N

etw

ork

Clo

ud

AD

tech

no

log

yC

DN

CD

NC

lou

dC

DN

CD

NS

ecu

rity

Web

An

aly

tics

Tele

co

m V

en

do

rC

lou

dD

NS

Clo

ud

un

kn

ow

n

Ca

ida

ra

nk

Figure 9 Birdrsquos eye view of Top-100 anycast ASes (ranked according to geographical footprint)

Sprint TATA Communications Qwest Level 3 Hur-ricane Electrics) but also a rather large spectrum ofOTTs such as CDNs (eg CloudFlare EdgeCast) host-ing (eg OVH) and cloud providers (eg MicrosoftAmazon Web Services) social networks (eg TwitterFacebook LinkedIn) and security companies that pro-vide mitigation services against DDoS attacks (eg Pro-lexic OpenDNS) The list also includes manufactur-ers (eg Apple RIM) Web registrars (eg Verisignnicat) virtual roaming and virtual meeting services(Media Network Services) blogging platforms (Automat-tic a publishing company hosting wordPresscom) cloudmessaging (EASYLINK2 owned by ATampT Services)and web analytics (OMNITURE owned by Adobe Sys-tems) Of course DNS-related service providers suchas root and top-level domain servers (eg ISCF-rootCommunityDNS) DNS service management (eg Ul-traDNS DynDNS) and public DNS resolvers (eg GoogleDNS OpenDNS) also emerge in the censusDiversity We report a breakdown of AS classes inFig 11 crisply showing that DNS now represents aboutone third of IP anycast activities Plots in Fig 9 clearly

illustrate the large diversity of anycast usage under allmetrics Indeed no correlation appear between any twometrics illustrating the degree of freedom in anycast de-ployments for instance the geographical footprint andIP24 footprints are largely unrelated (Pearson corre-lation coefficient of 035) Additionally the number ofopen ports and the specific port values vary not onlyacross deployments having an heterogeneous businessmodel (eg we observe from a minimum of 1 openport for DNS to O(104) open ports for OVH) but alsobetween deployments of the same kind (eg CloudFlareand EdgeCast CDNs have in common only port 53 80and 443 over the set of 22 open ports they are usingwith CloudFlare using 4times more ports than EdgeCast)

Geographical footprint We specifically study themean number of geographical replicas per AS (bottomplot in Fig 9) championed by the CDN CloudFlare inour measurement Overall we observe that 25 ASeshave at least 10 replicas distributed around the globeNotice that these orders of magnitude are significantlysmaller with respect to L7-anycast deployments that

0

5

10

15

20

25

30

35

DNSCDN

CloudISP

UnknownSecurity

SocialOther

AS

bre

akd

ow

n [

]

Figure 11 Breakdown of AS category (only firstcategory is considered)

can exceed O(103) for the large providers which is inpart due to the low number of vantage points in Plan-etLab (see Sec32) Among those ASes we observe10 DNS service providers (including ISC DNScast andDynDNS) and 7 major CDNs (eg CloudeFlare Lime-light Highwinds Fastly CacheNetworks Instart LogicCDNetworks) We also discover two cloud providers(eg Microsoft and Aryaka Networks) one tier-1 ISP(Hurricane Electric which has 15 of ASes in its cus-tomer cone according to CAIDA) a security company(OpenDNS also popular for its public DNS service) asocial network (Facebook) and a manufacturer (Apple)

Fig 12 further reports the cumulative number of repli-cas per IP24 depicting both results coming from thecombination of censuses as well as individual resultfrom each census alone Specifically the MIS solverorders circles by increasing radius size intuitively thesmaller the latency the lower the number of overlapsthe better the recall of our method This is confirmedin Fig 12 where censuses are combined by computingthe minimum among multiple latency measurementsbetween the same VP and target pair to get an es-timate of the RTT latency that is as close as possibleto the propagation delay Additionally combining mea-surement increases recall about 200 more IP24 arefound to be anycast in the combination with respect tothe average individual census

In this paper we limitedly consider results from thecombination but remark that results are quite con-sistent across censuses (notice that curves overlap inFig 12) A last comment is worth making about de-ployments where we observe only 2 geographically dis-tributed replicas ndash which is possibly due to the low den-sity of our VPs but could also be tied to the wronggeolocation of some VP raising false positive replicasWhile we have anecdotal evidence of some of these ex-iguous deployments being anycast we prefer to defer amore detailed analysis for future work (see Sec 5)

0 01 02 03 04 05 06 07 08 09

1

2 5 10 15 20 25

CD

F o

f n

um

ber

of

IPs2

4

Number of replicas

Combination of censuses-308VPs(130315) Census 4-240VPs(040315) Census 3-269VPs(020315) Census 2-255VPs(270215) Census 1-261VPs

Figure 12 CDF of geographically distinct repli-cas per IP24 (individual censuses and overall)

0 01 02 03 04 05 06 07 08 09

1

1 10 100 1000

CLOUDFLARENETUS (328)

GOOGLEUS (102)

EDGECASTUS (37)

PROLEXICUS (21)

LINKEDINUS (1)

APPLEUS (6)

TWITTERUS (3)

LEVEL3US (2)C

DF

of

nu

mb

er

of

AS

es

Number of IPs24

gt= 5 Replicas

Figure 13 Number of IPs24 per AS

IP24 footprint In terms of the number of anycastIPs24 per AS (middle plot in Fig 9) we find thatthe CDN CloudFlare is by far the largest in terms ofIP address ranges Overall we find 10 ASes that haveat least 10 anycast IPs24 3 are CDNs (CloudFlareEdgeCast BitGravity) 3 are DNS providers (DNScastWoodyNet UltraDNS) and the remaining ASes rep-resent multiple services (Automattic Google AmazonWeb Services and Prolexic) The distribution of thenumber of IPs24 per AS depicted in Fig 13 shows thatabout half have exactly one IP24 (eg LinkedIn andATampT Services) Yet about 10 of the ASes employat least 10 subnets for instance Prolexic EdgeCastGoogle and CloudFlare employ 21 37 102 and 328anycast IP24 respectively

While in this work we do not provide a systematicinvestigation of the deployment density (ie how manyIP32 are alive in each IP24) from the above discus-sion about diversity is not surprising that we were ableto identify both very sparse (eg Google 8888 is theonly address alive in the 888024) and very dense de-ployments (eg well over 99 of IPs are alive in mostCloudFlare subnets)

Importance The presence of ASes ranking amongthe top-100 in the CAIDA list as well as CDNs servingcontent in the top-100k Alexa list are good indicatorsthat anycast is used for popular and important servicesConsidering CDNs that are after DNS the most popu-lar anycast service according to Fig 11 we observe that8 CDNs serve Alexa-100k websites this set includesCloudFlare EdgeCast and Fastly with 188 10 and5 websites respectively (in addition Highwinds Cach-eNetworks Instart Incapsula and BitGravity host onepopular site each) In addition 11 of the websites listedby Alexa are hosted by Google anycast IPs Finally10 websites are hosted on IPs that belong to Prolexic(now part of Akamai) which operates a DDoS mitiga-tion service that receives the traffic on behalf of its clientnetworks redirecting only legitimate traffic to them

43 Anycast ServicesPortscan campaign In reason of the historical use ofanycast for DNS services we believe it to be importantto provide an up-to-date longitudinal view across ser-vices offered via IP-anycast especially focusing on TCPWe provide a summary of the nmap probing in the topof Fig 14 We test all anycast 24 of the top-100 ASespicking a single IP representative per 24 we scan atlow rates all 216 TCP ports Our results are conser-vative in that different IPs in the same 24 may havedifferent open ports (which happens eg for Cloud-Flare and EdgeCast) and since an under-estimation ofthe number of open TCP ports can also be the result ofprobe filtering by firewalls and routers along the pathto the targets Out of the 897 IP of the top-100 ASeswe find that 816 of 81 ASes have at least one open TCPport The total number of distinct open TCP portsacross is 10485 providing 449 well-known services (ieas indicated by TCP port classification) 170 of whichover SSL Additionally nmap fingerprinting discovers30 different software implementations running on theanycast replicas that we also detail next

Class imbalance Given the heterogeneity of the IP24footprint we argue being necessary to consider only per-AS statistics to avoid presenting results that are biaseddue to class imbalance We illustrate the problem bydepicting in Fig 14 the frequency count of the top-10open TCP ports by number of ASes (top) and IPs24(bottom) Notice that only port 80 443 and 53 appearto be common to both top and bottom plots especiallyall ports in the hatched area are due to the large pre-dominance of IP24 owned by the CloudFlare AS whichalso affect the order of common ports in the top-10 Wethus focus on per AS statistics in the following

Stateful services Fig 15 presents the complemen-tary CDF of the number of open TCP ports per AS

nmap portscan statisticsIPs32 ASes Ports (SSL) Well Software

known812 81 10499 (185) 457 30

0

200

400

80

443

8080

53

2052

2053

2082

2083

8443

2087IP

24 f

req

uen

cy

IP2

4 f

req

uen

cy

0

200

400

htt

p

htt

p s

sl

htt

p

do

main

htt

p

htt

p

htt

p

htt

p

htt

p

htt

p

0

20

40

60

53

80

443

179

22

8080

8083

3306

1935

5252A

S f

req

uen

cy

AS

fre

qu

en

cy

0

20

40

60

do

main

htt

p

htt

p-s

sl

tcp

wra

pp

ed

ssh

htt

p

tcp

wra

pp

ed

tcp

wra

pp

ed

sip

mo

vaz-s

sc

Figure 14 Overall nmap portscan statistics andTop-10 open TCP ports (per AS and per 24)

We make the following observations (i) roughly half ofthe ASes have at least one open TCP port (ii) about10 of the ASes have at least 5 open TCP ports and(iii) the largest service footprint is represented by Incap-sula and especially OVH with 313 and 10148 open portsrespectively In the latter case while we did not inves-tigate thoroughly we suspect the large number of portsbeing due to the fact that OVH the largest hosting ser-vice in Europe and the 3rd in the world is significantlypopular in the BitTorrent seedbox ecosystem [42] Pre-dominant services (beyond DNS) include fairly popu-lar HTTP and HTTPS used by over 20 of the ASesEven excluding the OVH case the list of interestingservices is large In terms of business diversity 22 ASeshave at least 4 different TCP ports open 8 CDNs 4DNS 4 ISPs including a tier-1 ISP (Tinet SpA) andGoogle with 9 open TCP ports Finally interesting(though unpopular) services worth listing include multi-media services (RTMP Simplify Media MythTV) andgaming (Minecraft)

Software diversity Fig 16 lists 30 different soft-ware that we group into three main categories WebMail and DNS Interestingly the list includes opensource software such as popular web and DNS daemons(eg nginx ISC BIND) and proprietary software (egECAccECSECD which are web servers developed byEdgeCast) Starting with DNS software notice that for44 ASes using port 53 (out of 67) nmap could not iden-tify the software version running on the remote serverUnsurprisingly we find that ISC BIND is by far themost adopted protocol to handle DNS requests overanycast Yet we also detect the use by 3 ASes (AppleK-ROOT L-ROOT) of the NLnet Labs NSD implemen-

001

01

1

1 10 100 1000 10000 100000

OVHFR (10148)

INCAPSULAUS (330)

CLOUDFLARENETUS (20)

GOOGLEUS (9)

EDGECASTUS (5)C

CD

F

Number of open TCP ports

ASes

Figure 15 Complementary CDF of the numberof open TCP ports per AS

tation which is specifically designed to add resilienceagainst software failures of DNS root servers

Among web servers the most popular are nginx (7ASes) Apache httpd and lighttpd (ex aeligquo with 4ASes) We observe the use of proprietary web serversby some CDNs (eg cloudflare-nginx and Panel httpd)Though our dataset has a limited size we attempt acomparison with the relative popularity of webserversin the unicast world the Spearman correlation of popu-larity rank in our dataset with webserver ranks [2] in theAlexa-10M is low (038) As for the DNS case differencemay arise in some peculiar features that are especiallyvaluable in the anycast context Finally we detect thepresence of running daemons that serve mail on anycastIPs from Google (Gmail imapd Gmail pop3d gsmtp)as well as of RPC (ssh MicrosoftRPC) and databases(MySQLMicrosoft SQL)

5 DISCUSSIONWe present the first census(es) of IPv4 anycast de-

ployment gathered through an original and robust tech-nique implemented with an efficient and scalable systemdesign In spirit with the open source and data move-ment results of our census are available at [21]

Our characterization show that a tiny fraction of theIPv4 space is anycasted yet among the anycasters werecognize major players of the Internet ecosystem in-cluding top-ranking ISPs popular Cloud OTT and es-pecially CDN operators We additionally show greatheterogeneity along multiple directions and especiallyin terms of the offered services Particularly our portscancampaign of anycasted subnets reveals over 450 well-known services from over 10 000 unique open portsAdditionally we uncover 30 software implementationswith a relative breakdown that differs from softwareranking in the unicast IP world

Yet this work only scratches the surface and opensmore questions than it is able to answer as for instance

0

10

20

ISC

BIN

DN

Ln

et

Lab

s N

SD

Mic

roso

ft D

NS

Op

en

DN

S

ng

inx

lig

htt

pd

Ap

ach

e h

ttp

dE

CD

Mic

roso

ft H

TT

P

Mic

roso

ft IIS

Varn

ish

Ap

ach

e T

om

cat

bit

asic

v2

CF

S 0

213

clo

ud

flare

-ng

inx

cP

an

el h

ttp

dth

ttp

dE

CA

ccE

CS

Go

og

le h

ttp

din

sta

rt1

60

Gm

ail im

ap

dG

mail p

op

3d

Go

og

le g

sm

tp

Op

en

SS

HM

yS

QL

ssls

trip

Mic

roso

ft S

QL

M

icro

so

ft R

PC

DNS Web Mail Other

AS

fre

qu

en

cy

Figure 16 Breakdown of software running onanycast replicas

IP-level CDN Refining the active methodology bymapping content object (and not only frontpage) fromthe Alexa-100k would be needed to gather a better un-derstanding of IP-level CDNLongitudinal view Taking periodic censuses and an-alyzing the time evolution over longer timescales wouldallow to track evolution of IP anycast deploymentsTraffic volume A missing information concerns thetraffic volume served by IP anycast that can be gath-ered via passive measurement and annotated with re-sults of our census (ie binary flag per anycast IP24)Combine measurement platforms As we have seenit would be interesting to exploit multiple platforms inaddition to PlanetLab such as RIPE Atlas this wouldboth lead to a better characterization of large deploy-ments (eg increase the recall) as well as possibly assistin confirmingdiscarding suspicious deployments (iethose for which we detected 2 replicas from PL)BGP hijacking Detecting geo-inconsistencies for know-ingly unicast prefixes is symptomatic of BGP hijack-ing attacks being able to periodically and quickly scanthe network to raise alarms and cross-check them withother types of data (eg BGP feeds traceroute mea-surements) is a relevant extension of this work

AcknowledgementsWe thank our shepherd Matteo Varvello and the anony-mous CoNEXT reviewers for their valuable feedbackThis work has been carried out at LINCS (httpwwwlincsfr) The research leading to these results was sup-ported by the European Union under the FP7 GrantAgreement n 318627 (Integrated Project rdquomPlanerdquo)and by a Google Faculty Research Award

6 REFERENCES[1] httpinternetcensus2012bitbucketorgpaperhtml

[2] httpw3techscomtechnologiesoverviewweb serverall

[3] J Abley A software approach to distributing requestsfor DNS service using GNU Zebra ISC BIND 9FreeBSD In Proc USENIX ATEC 2004

[4] J Abley and K Lindqvist Operation of AnycastServices RFC 4786 (Best Current Practice) 2006

[5] V K Adhikari S Jain and Z li Zhang Youtubetraffic dynamics and its interplay with a tier-1 isp Anisp perspective In ACM IMC 2010

[6] httpwwwcaidaorgprojectsark

[7] httpwwwroot-serversorg

[8] F Baker Requirements for IP Version 4 RoutersIETF RFC 1812 1995

[9] H Ballani and P Francis Towards a global IP anycastservice In Proc ACM SIGCOMM 2005

[10] H Ballani P Francis and S Ratnasamy Ameasurement-based deployment proposal for IPanycast In ACM IMC 2006

[11] B Barber M Larson and M Kosters Traffic sourceanalysis of the J root anycast instances Nanog 2006

[12] I Bermudez S Traverso M Mellia and M MunafoExploring the cloud from passive measurements TheAmazon AWS case In Proc IEEE INFOCOM 2013

[13] P Boothe and R Bush DNS Anycast Stability SomeEarly Results CAIDA 2005

[14] R Braden Requirements for Internet Hosts -Communication Layers IETF RFC 1122 1989

[15] M Calder X Fan Z Hu E Katz-BassettJ Heidemann and R Govindan Mapping theexpansion of googlersquos serving infrastructure In ACMIMC 2013

[16] M Calder A Flavel E Katz-Bassett R Mahajanand J Padhye Analyzing the performance of ananycast cdn In ACM IMC 2015

[17] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman A fistful of pings Accurateand lightweight anycast enumeration and geolocationIn Proc IEEE INFOCOM 2015

[18] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman Latency-based anycastgeolocalization Algorithms software and datasets InTech Rep 2015

[19] L Colitti Measuring anycast server performance Thecase of K-root Nanog 2006

[20] C Contavalli W van der Gaast S Leach andE Lewis Client Subnet in DNS Queries httpstools

ietforghtmldraft-ietf-dnsop-edns-client-subnet-04

[21] httpwwwenstfr˜drossianycast

[22] Z Durumeric E Wustrow and J A HaldermanZmap Fast internet-wide scanning and its securityapplications In USENIX Security Symposium 2013

[23] B Eriksson P Barford J Sommers and R NowakA learning-based approach for IP geolocation InPAM 2010

[24] B Eriksson and M Crovella Understandinggeolocation accuracy using network geometry In ProcIEEE INFOCOM 2013

[25] X Fan J S Heidemann and R GovindanEvaluating anycast in the domain name system InProc IEEE INFOCOM 2013

[26] httpwwwict-mplaneeupublicfastping

[27] A Flavel P Mani D A Maltz N Holt J LiuY Chen and O Surmachev Fastroute A scalableload-aware anycast routing architecture for moderncdns In Proc USENIX NSDI 2015

[28] B Gueye A Ziviani M Crovella and S FdidaConstraint-based geolocation of internet hosts InACM IMC 2004

[29] T Hardie Distributing Authoritative Name Serversvia Shared Unicast Addresses IETF RFC 3258 2002

[30] J Heidemann Y Pradkin R GovindanC Papadopoulos G Bartlett and J BannisterCensus and survey of the visible internet In ACMIMC 2008

[31] Internet addresses hitlist dataset (20140829rev4338)Provided by the USCLANDER project(httpwwwisieduantlander)

[32] D Karrenberg Anycast and BGP stability A closerlook at DNSMON data Nanog 2005

[33] T Krenc O Hohlfeld and A Feldmann An internetcensus taken by an illegal botnet A qualitativeassessment of published measurements ACM CCR44(3) 2014

[34] Z Liu B Huffaker M Fomenkov N Brownlee andK C Claffy Two days in the life of the DNS anycastroot servers In PAM 2007

[35] D Madory C Cook and K Miao Who are theanycasters Nanog 2013

[36] B M Maggs and R K Sitaraman Algorithmicnuggets in content delivery SIGCOMM ComputCommun Rev 45(3) Jul 2015

[37] K Miller Deploying IP anycast Nanog 2003

[38] httpsnmaporg

[39] httpswwwopendnscomdata-center-locations

[40] C Pelsser L Cittadini S Vissicchio and R BushFrom paris to tokyo On the suitability of ping tomeasure latency In ACM IMC 2013

[41] I Poese S Uhlig M A Kaafar B Donnet andB Gueye IP geolocation databases Unreliable ACMCCR 41(2) 2011

[42] D Rossi G Pujol X Wang and F Mathieu Peekingthrough the BitTorrent seedbox hosting ecosystem InProc Traffic Monitoring and Analysis (TMA) 2014

[43] S Sarat V Pappas and A Terzis On the use ofanycast in DNS In Proc ACM SIGMETRICS 2005

[44] Y Shavitt and N Zilberman A geolocation databasesstudy IEEE J-SAC 29(10) 2011

[45] F Streibelt J Bottger N Chatzis G Smaragdakisand A Feldmann Exploring edns-client-subnetadopters in your free time In ACM IMC 2013

[46] A Su D R Choffnes A Kuzmanovic andF Bustamante Drafting behind akamai(travelocity-based detouring) In Proc ACMSIGCOMM 2006

[47] R Torres A Finamore J R Kim M Mellia M MMunafo and S Rao Dissecting video server selectionstrategies in the YouTube CDN In Proc of IEEEICDCS 2011

[48] S Zander L L Andrew and G Armitage Capturingghosts Predicting the used ipv4 space by inferringunobserved addresses In ACM IMC 2014

  • Introduction
  • Methodology Overview
    • Workflow
    • State of the art
      • System design
        • Census targets
        • Measurement dataset vs platform
        • Measurement software
        • Network protocol
        • Scalability
          • Anycast 0 censuses
            • At a glance
            • Top-100 Anycast ASes
            • Anycast Services
              • Discussion
              • References
Page 7: Characterizing IPv4 Anycast Adoption and Deployment€¦ · Characterizing IPv4 Anycast Adoption and Deployment ... CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM

Accuracy While our technique relies on latency mea-surements it leverages the discriminative power of sidechannel information (ie cities population within disks)to cope with latency measurement noise While we vali-date the accuracy of the methodology for DNS in [17] avalidation for stateful TCP connections is still missingTo do so we build a ground truth (GT) for CloudFlareand EdgeCast by performing HTTP measurements withcurl from PL note that HTTP measurements are notavailable from RIPE highlighting once more the com-plementarity of these platforms

By inspection of the HTTP headers we find thatCloudFlare (EdgeCast) encode geolocation of the replicain the custom CF-RAY (standard Server) header fieldNotice that the measured GT constitutes the upper-bound of what can be possibly achieved from PL mea-surements while the publicly available information (PAI)displayed on the CloudFlare and EdgeCast websites con-tains a super-set of locations with respect to those mea-sured from PL We contrast true positive (TPR) clas-sification of our census vs HTTP GT in Fig 7 in 77of the IP24 for CloudFlare (65 for EdgeCast) thereis agreement at city level with a median error of 434Km (287 Km for EdgeCast) in the (relatively few) mis-classification cases As expected the low number of PLnodes possibly limits the portion of discoverable repli-cas (GTPAI is fairly high for CloufFlare but fairly lowfor EdgeCast) making our footprint estimates conser-vative and confirming the interest for alternative plat-forms such as RIPE

Consistency Additionally in the case of openDNSwe verify consistency across multiple RTT latency mea-surement techniques used early in Fig 6 In this case werely on public information that maps 24 locations [39]For all protocols applying [17] on the dataset yieldsbetween 15 and 17 instances Notice that all cities re-turned by the analysis are correct except Philadelphia(while the server is located in Ashburn at 260km or26ms worth of propagation delay away) this misclas-sification is due to the bias enforced in [17] toward citypopulation (Philadelphia is 33 times more populatedthan Ashburn) but as observed in [15] this is not prob-lematic as the ldquophysicalrdquo Ashburn location is actuallyserving the ldquologicalrdquo Philadelphia population

35 ScalabilityProbing rate When designing census experimentswe take care of avoiding obvious pitfalls For instancewhile we target a single host per 24 nevertheless weperform measurements from all PL nodes It followsthat each node must desynchronize to avoid hitting ICMPrate limiting (or raising alert) at the destination We doso by randomized permutation for target nodes achieved

Table 1 Textual (0) vs binary (1-4) censusesCensus ID Format Size (hosttotal) Analysis0 csv (270M 79G) gt3 days1-4 binary (21M 6G) 3 hr

0

02

04

06

08

1

1 2 4 8 16

CD

F

Completion time per VP [Hr]

Figure 8 CDF of per-vantage point completiontime over all censuses

via a Linear Feedback Shift Register (LFSR) with Ga-lois configuration Still while the LFSR solves rate lim-iting at the target it does not solve problems at thesource (or in the network) indeed while requests arewell spread replies do aggregate close to the VP thatreceives an aggregate rate equal to the probing rate ofFastping (in excess of 10000 hosts per second) In ourpreliminary (and incomplete) censuses we noted het-erogeneous (and possibly very high) drop rates for someVPs (likely tied to rate limiting spatially close to theVP) Given that the networks where PL machines arehosted are independently administered we opted for asimple solution and slowed down Fastping by one orderof magnitude1 that we verified empirically not trigger-ing the above problems Consequently probing 66 middot106

targets at 103 targets per second takes less than twohours as shown in Fig8 about 40 of PL nodes com-plete within this timeframe and 95 in under 5 hours(longer duration likely due to load on the PL host)

Output size and analysis duration A second scal-ability issue concerns the output format We initiallyoverlooked this issue and logged in textual format awealth of information amounting to 270M per node and80GB overall per census (cf Tab1) We therefore optedfor a radical reduction of the output size dumping astripped-down binary format containing a timestampdelay and ICMP flag (encoding greylist return codes 910 or 13 as a negative sign) for a total of about 20MBper node and 6GB overall per census

A third challenge lies in the analysis of the data Fora single target the running time of [17] is O(10minus1) secwhich compares very favorably to the O(103) sec of thebrute force optimal solution at the same time pro-cessing a census would still take days (we indeed have

1While it is possible to more finely tune the probingrate per VP however coverage may benefit from samplescoming from the slowest VP especially if it resides in ageographical area which is not well covered by PL

stopped processing the complete Census-0 after 3 daysof CPU time where textual format additionally led toslow processing due to disk fragmentation) Moreoverdue to LFSR the order of the target IPs in all filesis not the same meaning that an on-the-fly sorting ofabout 300 lists (one per VP) containing millions targetsis needed We therefore optimized our implementationwhich currently runs in under three hours ie aboutthe same timescale of the census duration so that inprinciple we could perform a continuous analysis Whilethis is not interesting for the anycast characterizationuse-case it may become relevant for other applicationsof this technique (eg BGP hijacking inference men-tioned in Sec 5)

4 ANYCAST 0 CENSUSESThis section presents results of the first Internet-wide

anycast study We start by aggregated statistics (Sec 41)and then incrementally refine the picture by providinga birdrsquos-eye view of the most interesting deployments(Sec 42) over which we perform an additional portscancampaign to reveal their running services (Sec 43)

41 At a glanceDetails about the of our censuses are reported in Fig 10Overall 1696 IP24 belonging to 346 ASes appear tohave more than one anycast replica while we were ableto find only 897 IP24 belonging to 100 ASes havingat least 5 replicas with our technique The plot alsoshows a geographical density map of anycast replicasresults of our censuses are available for browsing at [21]offering per-deployment (as in Fig 5) or aggregated (asin Fig 10) visualizations Notice that results reportedin this paper correspond to censuses performed duringMarch 2015 with later censuses we observed small butinteresting changes in the anycast landscape While weplan to run a continuous service please be advised that(at time of writing) results at [21] refer to the censusesdescribed in this paper

Several remarks from Fig 10 are in order First no-tice that our results are conservative since (i) in regionswith low presence of PlanetLab VPs we may miss someanycast replicas eg when the BGP prefix is only lo-cally advertised (ii) the analysis technique provides alower bound on the number of replicas since overlap-ping disks may correspond to different anycast replicasbut they will not be considered in the solution of theMIS problem (recall Fig 3) Second we investigate theCAIDA AS rank list to cross check how many ASes us-ing IP-anycast figure in the top-100 results tabulatedin Fig 10 show that 19 IP24 of 8 ASes that play acentral role in the Internet belong to the list Similarlywe investigate the Alexa rank to cross check how manywebpages in the top-100k rank are hosted2 by ASes us-

2For the sake of simplicity we resolve the domain name

IP24 ASes Cities CC ReplicasAll 1696 346 77 38 13802ge 5 Replicas 897 100 71 36 11598

cap CAIDA-100 19 8 30 18 138cap Alexa-100k 242 15 45 29 4038

Figure 10 Anycast censuses results at a glance

ing IP-anycast here again we find 242 IP24 of 15ASes that are among the major players of the Web

42 Top-100 Anycast ASesAlbeit the amount of anycast IP24 may seem de-

ceiving at first in reason of its exiguous footprint it isnevertheless very rich ndash revealing silver needles in thehaystack From the very coarse cross-check of CAIDAand Alexa ranks we already expect that anycast usageis not only restricted to DNS but rather covers impor-tant ISPs and OTTs Fig 9 presents a birdrsquos-eye viewof anycast adoption depicting several information forthe 100 ASes for which we detected at least 5 replicasidentified by their WHOIS name reported in the x-axis(capped to 12 characters) Geographical and IP24 foot-print are reported in the bottom ASes are arranged leftto right in decreasing number of replicas (bottom bar-plot with standard deviation across IP24 belongingto the same AS) additionally reporting the number ofanycast IP24 for that AS (middle bar-plot) Servicefootprint is correlated to the open TCP ports in the AS(middle scatter-plot) Next the relative importance ofthe AS in the Internet and for the Web are expressedin terms of the CAIDA and Alexa ranks respectively(top scatter-plots) Finally a label reported on the topx-axis categorize the main activity of the ASes from abusiness perspective (category is informal and in case ofASes with multiple services only the most prominent isselected)

Big fishes Major players of the Internet ecosystemare easy to spot in Fig 9 The list includes not onlytier-1 and other ISPs (such as ATampT Services Tinet

of the frontpage found in Alexa to an IP and disregardcontent that is referenced in the frontpage

0

10

20

1 C

LO

UD

FL

AR

EN

ET

2 IS

C-A

SU

S3 H

UR

RIC

AN

EU

S4 C

DN

ET

WO

RK

SU

S-

5 F

AC

EB

OO

KU

S6 C

OM

MU

NIT

YD

NS

7 X

GT

LD

US

8 L

-RO

OT

US

9 M

ICR

OS

OF

TU

S10 I-R

OO

TS

E11 V

ER

ISIG

N-I

NC

12 L

LN

WU

S13 A

RY

AK

A-A

RIN

14 A

PP

LE

-EN

GIN

E15 C

ED

EX

ISU

S16 H

IGH

WIN

DS

3U

17 N

ET

NO

D-I

XS

E18 O

PE

ND

NS

US

19 W

OO

DY

NE

T-1

U20 L

GT

LD

US

21 L

IEC

HT

EN

ST

EI

22 F

AS

TL

YU

S23 C

AC

HE

NE

TW

OR

K24 IN

ST

AR

TU

S25 D

NS

CA

ST

-AS

U26 G

OO

GL

EU

S27 E

DG

EC

AS

T-I

R

28 U

MD

NE

TU

S29 D

YN

DN

SU

S30 N

SO

NE

US

31 E

AS

YL

INK

4U

S32 Y

AH

OO

-AN

2U

S33 U

LT

RA

DN

SU

S34 O

VH

FR

35 L

IEC

HT

EN

ST

EI

36 A

S-A

FIL

IAS

1

37 A

UT

OM

AT

TIC

U38 T

INE

T-B

AC

KB

O39 A

BO

VE

NE

T-C

US

40 A

MA

ZO

N-0

2U

S41 C

WG

B42 L

EV

EL

3U

S43 E

DG

EC

AS

TU

S44 T

WIT

TE

R-N

ET

W45 IN

CA

PS

UL

AU

S46 A

GT

LD

US

47 A

US

RE

GIS

TR

Y-

48 C

EN

TR

AL

NIC

-A49 C

OG

EN

T-2

149

50 H

GT

LD

US

51 H

IGH

WIN

DS

4U

52 K

-RO

OT

-SE

RV

E53 N

ET

RIP

LE

X01

54 O

MN

ITU

RE

US

55 S

OF

TL

AY

ER

US

56 W

AN

GS

U-U

SU

S57 Y

AH

OO

-FC

US

58 B

ITG

RA

VIT

YU

59 A

BIL

EN

EU

S60 A

DV

AN

-CA

ST

U61 A

SA

TTL

DSE

62 A

S-Q

UA

DR

AN

ET

63 A

S6453U

S64 A

TT

EU

65 C

EN

TR

AL

NIC

-A66 C

EN

TU

RY

LIN

K-

67 C

ON

EX

IM-A

S-A

68 E

GT

LD

US

69 K

GT

LD

US

70 M

NS

-AS

NO

71 N

ICA

TA

T72 V

ITA

L-D

NS

US

73 W

HS

-AN

YC

AS

T-

74 Z

GT

LD

US

75 IN

TE

RN

AP

-BL

K76 N

ET

AP

P-A

NY

CA

77 S

PR

INT

LIN

KU

78 A

US

RE

GIS

TR

Y-

79 C

EN

TU

RY

LIN

K-

80 D

NS

IMP

LE

US

81 D

YN

-HC

US

82 E

AS

YL

INK

2U

S83 E

DN

SC

A84 E

SG

OB

-AN

YC

AS

85 H

OM

EP

L-A

SP

L86 L

INK

ED

INU

S87 M

AS

ER

GY

US

88 M

ED

IAM

AT

H-I

N89 M

II-2

GB

90 M

II-X

PC

US

91 P

EE

R1U

S92 P

HH

-AS

DE

93 P

RE

TE

CS

CA

94 P

RO

LE

XIC

US

95 Q

UA

NT

CA

ST

US

96 R

IMB

LA

CK

BE

RR

97 S

UP

ER

NE

TW

OR

K98 U

NO

VA

-1C

A99 V

OX

ILIT

YR

O100 Z

VO

NK

OV

A-A

S

R

ep

lica

s

1

10

100

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

IP

s2

4

5380

443

8080

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00Op

en

po

rts

100

1k

10k

100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00A

lexa

ra

nk

110

1001k

10k100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

CD

ND

NS

ISP

CD

NS

ocia

l N

etw

ork

DN

SD

NS

DN

SC

lou

dD

NS

DN

SC

DN

Clo

ud

CD

NS

ecu

rity

CD

ND

NS

Secu

rity

DN

SD

NS

DN

Su

nkn

ow

nC

DN

CD

NC

DN

DN

SC

lou

dD

NS

CD

Nu

nkn

ow

nD

NS

DN

SC

lou

d m

essag

ing

Web

Po

rtal

DN

SC

lou

du

nkn

ow

nD

NS

Blo

gg

ing

ISP

-tie

r1IS

PC

lou

dIS

PIS

P-t

ier1

CD

NS

ocia

l N

etw

ork

CD

ND

NS

DN

SD

NS

ISP

DN

SC

DN

DN

SD

NS

On

lin

e M

ark

eti

ng

Clo

ud

CD

NW

eb

Po

rtal

CD

NB

ackb

on

e N

etw

ork

un

kn

ow

nD

NS

Clo

ud

ISP

-tie

r1IS

PD

NS

ISP

-tie

r1C

lou

dD

NS

DN

SV

ideo

Co

nfe

ren

cin

gD

NS

DN

SS

ecu

rity

DN

SC

lou

dW

eb

An

aly

tics

ISP

-tie

r1D

NS

ISP

DN

SD

NS

Clo

ud

messag

ing

DN

SD

NS

Clo

ud

So

cia

l N

etw

ork

Clo

ud

AD

tech

no

log

yC

DN

CD

NC

lou

dC

DN

CD

NS

ecu

rity

Web

An

aly

tics

Tele

co

m V

en

do

rC

lou

dD

NS

Clo

ud

un

kn

ow

n

Ca

ida

ra

nk

Figure 9 Birdrsquos eye view of Top-100 anycast ASes (ranked according to geographical footprint)

Sprint TATA Communications Qwest Level 3 Hur-ricane Electrics) but also a rather large spectrum ofOTTs such as CDNs (eg CloudFlare EdgeCast) host-ing (eg OVH) and cloud providers (eg MicrosoftAmazon Web Services) social networks (eg TwitterFacebook LinkedIn) and security companies that pro-vide mitigation services against DDoS attacks (eg Pro-lexic OpenDNS) The list also includes manufactur-ers (eg Apple RIM) Web registrars (eg Verisignnicat) virtual roaming and virtual meeting services(Media Network Services) blogging platforms (Automat-tic a publishing company hosting wordPresscom) cloudmessaging (EASYLINK2 owned by ATampT Services)and web analytics (OMNITURE owned by Adobe Sys-tems) Of course DNS-related service providers suchas root and top-level domain servers (eg ISCF-rootCommunityDNS) DNS service management (eg Ul-traDNS DynDNS) and public DNS resolvers (eg GoogleDNS OpenDNS) also emerge in the censusDiversity We report a breakdown of AS classes inFig 11 crisply showing that DNS now represents aboutone third of IP anycast activities Plots in Fig 9 clearly

illustrate the large diversity of anycast usage under allmetrics Indeed no correlation appear between any twometrics illustrating the degree of freedom in anycast de-ployments for instance the geographical footprint andIP24 footprints are largely unrelated (Pearson corre-lation coefficient of 035) Additionally the number ofopen ports and the specific port values vary not onlyacross deployments having an heterogeneous businessmodel (eg we observe from a minimum of 1 openport for DNS to O(104) open ports for OVH) but alsobetween deployments of the same kind (eg CloudFlareand EdgeCast CDNs have in common only port 53 80and 443 over the set of 22 open ports they are usingwith CloudFlare using 4times more ports than EdgeCast)

Geographical footprint We specifically study themean number of geographical replicas per AS (bottomplot in Fig 9) championed by the CDN CloudFlare inour measurement Overall we observe that 25 ASeshave at least 10 replicas distributed around the globeNotice that these orders of magnitude are significantlysmaller with respect to L7-anycast deployments that

0

5

10

15

20

25

30

35

DNSCDN

CloudISP

UnknownSecurity

SocialOther

AS

bre

akd

ow

n [

]

Figure 11 Breakdown of AS category (only firstcategory is considered)

can exceed O(103) for the large providers which is inpart due to the low number of vantage points in Plan-etLab (see Sec32) Among those ASes we observe10 DNS service providers (including ISC DNScast andDynDNS) and 7 major CDNs (eg CloudeFlare Lime-light Highwinds Fastly CacheNetworks Instart LogicCDNetworks) We also discover two cloud providers(eg Microsoft and Aryaka Networks) one tier-1 ISP(Hurricane Electric which has 15 of ASes in its cus-tomer cone according to CAIDA) a security company(OpenDNS also popular for its public DNS service) asocial network (Facebook) and a manufacturer (Apple)

Fig 12 further reports the cumulative number of repli-cas per IP24 depicting both results coming from thecombination of censuses as well as individual resultfrom each census alone Specifically the MIS solverorders circles by increasing radius size intuitively thesmaller the latency the lower the number of overlapsthe better the recall of our method This is confirmedin Fig 12 where censuses are combined by computingthe minimum among multiple latency measurementsbetween the same VP and target pair to get an es-timate of the RTT latency that is as close as possibleto the propagation delay Additionally combining mea-surement increases recall about 200 more IP24 arefound to be anycast in the combination with respect tothe average individual census

In this paper we limitedly consider results from thecombination but remark that results are quite con-sistent across censuses (notice that curves overlap inFig 12) A last comment is worth making about de-ployments where we observe only 2 geographically dis-tributed replicas ndash which is possibly due to the low den-sity of our VPs but could also be tied to the wronggeolocation of some VP raising false positive replicasWhile we have anecdotal evidence of some of these ex-iguous deployments being anycast we prefer to defer amore detailed analysis for future work (see Sec 5)

0 01 02 03 04 05 06 07 08 09

1

2 5 10 15 20 25

CD

F o

f n

um

ber

of

IPs2

4

Number of replicas

Combination of censuses-308VPs(130315) Census 4-240VPs(040315) Census 3-269VPs(020315) Census 2-255VPs(270215) Census 1-261VPs

Figure 12 CDF of geographically distinct repli-cas per IP24 (individual censuses and overall)

0 01 02 03 04 05 06 07 08 09

1

1 10 100 1000

CLOUDFLARENETUS (328)

GOOGLEUS (102)

EDGECASTUS (37)

PROLEXICUS (21)

LINKEDINUS (1)

APPLEUS (6)

TWITTERUS (3)

LEVEL3US (2)C

DF

of

nu

mb

er

of

AS

es

Number of IPs24

gt= 5 Replicas

Figure 13 Number of IPs24 per AS

IP24 footprint In terms of the number of anycastIPs24 per AS (middle plot in Fig 9) we find thatthe CDN CloudFlare is by far the largest in terms ofIP address ranges Overall we find 10 ASes that haveat least 10 anycast IPs24 3 are CDNs (CloudFlareEdgeCast BitGravity) 3 are DNS providers (DNScastWoodyNet UltraDNS) and the remaining ASes rep-resent multiple services (Automattic Google AmazonWeb Services and Prolexic) The distribution of thenumber of IPs24 per AS depicted in Fig 13 shows thatabout half have exactly one IP24 (eg LinkedIn andATampT Services) Yet about 10 of the ASes employat least 10 subnets for instance Prolexic EdgeCastGoogle and CloudFlare employ 21 37 102 and 328anycast IP24 respectively

While in this work we do not provide a systematicinvestigation of the deployment density (ie how manyIP32 are alive in each IP24) from the above discus-sion about diversity is not surprising that we were ableto identify both very sparse (eg Google 8888 is theonly address alive in the 888024) and very dense de-ployments (eg well over 99 of IPs are alive in mostCloudFlare subnets)

Importance The presence of ASes ranking amongthe top-100 in the CAIDA list as well as CDNs servingcontent in the top-100k Alexa list are good indicatorsthat anycast is used for popular and important servicesConsidering CDNs that are after DNS the most popu-lar anycast service according to Fig 11 we observe that8 CDNs serve Alexa-100k websites this set includesCloudFlare EdgeCast and Fastly with 188 10 and5 websites respectively (in addition Highwinds Cach-eNetworks Instart Incapsula and BitGravity host onepopular site each) In addition 11 of the websites listedby Alexa are hosted by Google anycast IPs Finally10 websites are hosted on IPs that belong to Prolexic(now part of Akamai) which operates a DDoS mitiga-tion service that receives the traffic on behalf of its clientnetworks redirecting only legitimate traffic to them

43 Anycast ServicesPortscan campaign In reason of the historical use ofanycast for DNS services we believe it to be importantto provide an up-to-date longitudinal view across ser-vices offered via IP-anycast especially focusing on TCPWe provide a summary of the nmap probing in the topof Fig 14 We test all anycast 24 of the top-100 ASespicking a single IP representative per 24 we scan atlow rates all 216 TCP ports Our results are conser-vative in that different IPs in the same 24 may havedifferent open ports (which happens eg for Cloud-Flare and EdgeCast) and since an under-estimation ofthe number of open TCP ports can also be the result ofprobe filtering by firewalls and routers along the pathto the targets Out of the 897 IP of the top-100 ASeswe find that 816 of 81 ASes have at least one open TCPport The total number of distinct open TCP portsacross is 10485 providing 449 well-known services (ieas indicated by TCP port classification) 170 of whichover SSL Additionally nmap fingerprinting discovers30 different software implementations running on theanycast replicas that we also detail next

Class imbalance Given the heterogeneity of the IP24footprint we argue being necessary to consider only per-AS statistics to avoid presenting results that are biaseddue to class imbalance We illustrate the problem bydepicting in Fig 14 the frequency count of the top-10open TCP ports by number of ASes (top) and IPs24(bottom) Notice that only port 80 443 and 53 appearto be common to both top and bottom plots especiallyall ports in the hatched area are due to the large pre-dominance of IP24 owned by the CloudFlare AS whichalso affect the order of common ports in the top-10 Wethus focus on per AS statistics in the following

Stateful services Fig 15 presents the complemen-tary CDF of the number of open TCP ports per AS

nmap portscan statisticsIPs32 ASes Ports (SSL) Well Software

known812 81 10499 (185) 457 30

0

200

400

80

443

8080

53

2052

2053

2082

2083

8443

2087IP

24 f

req

uen

cy

IP2

4 f

req

uen

cy

0

200

400

htt

p

htt

p s

sl

htt

p

do

main

htt

p

htt

p

htt

p

htt

p

htt

p

htt

p

0

20

40

60

53

80

443

179

22

8080

8083

3306

1935

5252A

S f

req

uen

cy

AS

fre

qu

en

cy

0

20

40

60

do

main

htt

p

htt

p-s

sl

tcp

wra

pp

ed

ssh

htt

p

tcp

wra

pp

ed

tcp

wra

pp

ed

sip

mo

vaz-s

sc

Figure 14 Overall nmap portscan statistics andTop-10 open TCP ports (per AS and per 24)

We make the following observations (i) roughly half ofthe ASes have at least one open TCP port (ii) about10 of the ASes have at least 5 open TCP ports and(iii) the largest service footprint is represented by Incap-sula and especially OVH with 313 and 10148 open portsrespectively In the latter case while we did not inves-tigate thoroughly we suspect the large number of portsbeing due to the fact that OVH the largest hosting ser-vice in Europe and the 3rd in the world is significantlypopular in the BitTorrent seedbox ecosystem [42] Pre-dominant services (beyond DNS) include fairly popu-lar HTTP and HTTPS used by over 20 of the ASesEven excluding the OVH case the list of interestingservices is large In terms of business diversity 22 ASeshave at least 4 different TCP ports open 8 CDNs 4DNS 4 ISPs including a tier-1 ISP (Tinet SpA) andGoogle with 9 open TCP ports Finally interesting(though unpopular) services worth listing include multi-media services (RTMP Simplify Media MythTV) andgaming (Minecraft)

Software diversity Fig 16 lists 30 different soft-ware that we group into three main categories WebMail and DNS Interestingly the list includes opensource software such as popular web and DNS daemons(eg nginx ISC BIND) and proprietary software (egECAccECSECD which are web servers developed byEdgeCast) Starting with DNS software notice that for44 ASes using port 53 (out of 67) nmap could not iden-tify the software version running on the remote serverUnsurprisingly we find that ISC BIND is by far themost adopted protocol to handle DNS requests overanycast Yet we also detect the use by 3 ASes (AppleK-ROOT L-ROOT) of the NLnet Labs NSD implemen-

001

01

1

1 10 100 1000 10000 100000

OVHFR (10148)

INCAPSULAUS (330)

CLOUDFLARENETUS (20)

GOOGLEUS (9)

EDGECASTUS (5)C

CD

F

Number of open TCP ports

ASes

Figure 15 Complementary CDF of the numberof open TCP ports per AS

tation which is specifically designed to add resilienceagainst software failures of DNS root servers

Among web servers the most popular are nginx (7ASes) Apache httpd and lighttpd (ex aeligquo with 4ASes) We observe the use of proprietary web serversby some CDNs (eg cloudflare-nginx and Panel httpd)Though our dataset has a limited size we attempt acomparison with the relative popularity of webserversin the unicast world the Spearman correlation of popu-larity rank in our dataset with webserver ranks [2] in theAlexa-10M is low (038) As for the DNS case differencemay arise in some peculiar features that are especiallyvaluable in the anycast context Finally we detect thepresence of running daemons that serve mail on anycastIPs from Google (Gmail imapd Gmail pop3d gsmtp)as well as of RPC (ssh MicrosoftRPC) and databases(MySQLMicrosoft SQL)

5 DISCUSSIONWe present the first census(es) of IPv4 anycast de-

ployment gathered through an original and robust tech-nique implemented with an efficient and scalable systemdesign In spirit with the open source and data move-ment results of our census are available at [21]

Our characterization show that a tiny fraction of theIPv4 space is anycasted yet among the anycasters werecognize major players of the Internet ecosystem in-cluding top-ranking ISPs popular Cloud OTT and es-pecially CDN operators We additionally show greatheterogeneity along multiple directions and especiallyin terms of the offered services Particularly our portscancampaign of anycasted subnets reveals over 450 well-known services from over 10 000 unique open portsAdditionally we uncover 30 software implementationswith a relative breakdown that differs from softwareranking in the unicast IP world

Yet this work only scratches the surface and opensmore questions than it is able to answer as for instance

0

10

20

ISC

BIN

DN

Ln

et

Lab

s N

SD

Mic

roso

ft D

NS

Op

en

DN

S

ng

inx

lig

htt

pd

Ap

ach

e h

ttp

dE

CD

Mic

roso

ft H

TT

P

Mic

roso

ft IIS

Varn

ish

Ap

ach

e T

om

cat

bit

asic

v2

CF

S 0

213

clo

ud

flare

-ng

inx

cP

an

el h

ttp

dth

ttp

dE

CA

ccE

CS

Go

og

le h

ttp

din

sta

rt1

60

Gm

ail im

ap

dG

mail p

op

3d

Go

og

le g

sm

tp

Op

en

SS

HM

yS

QL

ssls

trip

Mic

roso

ft S

QL

M

icro

so

ft R

PC

DNS Web Mail Other

AS

fre

qu

en

cy

Figure 16 Breakdown of software running onanycast replicas

IP-level CDN Refining the active methodology bymapping content object (and not only frontpage) fromthe Alexa-100k would be needed to gather a better un-derstanding of IP-level CDNLongitudinal view Taking periodic censuses and an-alyzing the time evolution over longer timescales wouldallow to track evolution of IP anycast deploymentsTraffic volume A missing information concerns thetraffic volume served by IP anycast that can be gath-ered via passive measurement and annotated with re-sults of our census (ie binary flag per anycast IP24)Combine measurement platforms As we have seenit would be interesting to exploit multiple platforms inaddition to PlanetLab such as RIPE Atlas this wouldboth lead to a better characterization of large deploy-ments (eg increase the recall) as well as possibly assistin confirmingdiscarding suspicious deployments (iethose for which we detected 2 replicas from PL)BGP hijacking Detecting geo-inconsistencies for know-ingly unicast prefixes is symptomatic of BGP hijack-ing attacks being able to periodically and quickly scanthe network to raise alarms and cross-check them withother types of data (eg BGP feeds traceroute mea-surements) is a relevant extension of this work

AcknowledgementsWe thank our shepherd Matteo Varvello and the anony-mous CoNEXT reviewers for their valuable feedbackThis work has been carried out at LINCS (httpwwwlincsfr) The research leading to these results was sup-ported by the European Union under the FP7 GrantAgreement n 318627 (Integrated Project rdquomPlanerdquo)and by a Google Faculty Research Award

6 REFERENCES[1] httpinternetcensus2012bitbucketorgpaperhtml

[2] httpw3techscomtechnologiesoverviewweb serverall

[3] J Abley A software approach to distributing requestsfor DNS service using GNU Zebra ISC BIND 9FreeBSD In Proc USENIX ATEC 2004

[4] J Abley and K Lindqvist Operation of AnycastServices RFC 4786 (Best Current Practice) 2006

[5] V K Adhikari S Jain and Z li Zhang Youtubetraffic dynamics and its interplay with a tier-1 isp Anisp perspective In ACM IMC 2010

[6] httpwwwcaidaorgprojectsark

[7] httpwwwroot-serversorg

[8] F Baker Requirements for IP Version 4 RoutersIETF RFC 1812 1995

[9] H Ballani and P Francis Towards a global IP anycastservice In Proc ACM SIGCOMM 2005

[10] H Ballani P Francis and S Ratnasamy Ameasurement-based deployment proposal for IPanycast In ACM IMC 2006

[11] B Barber M Larson and M Kosters Traffic sourceanalysis of the J root anycast instances Nanog 2006

[12] I Bermudez S Traverso M Mellia and M MunafoExploring the cloud from passive measurements TheAmazon AWS case In Proc IEEE INFOCOM 2013

[13] P Boothe and R Bush DNS Anycast Stability SomeEarly Results CAIDA 2005

[14] R Braden Requirements for Internet Hosts -Communication Layers IETF RFC 1122 1989

[15] M Calder X Fan Z Hu E Katz-BassettJ Heidemann and R Govindan Mapping theexpansion of googlersquos serving infrastructure In ACMIMC 2013

[16] M Calder A Flavel E Katz-Bassett R Mahajanand J Padhye Analyzing the performance of ananycast cdn In ACM IMC 2015

[17] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman A fistful of pings Accurateand lightweight anycast enumeration and geolocationIn Proc IEEE INFOCOM 2015

[18] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman Latency-based anycastgeolocalization Algorithms software and datasets InTech Rep 2015

[19] L Colitti Measuring anycast server performance Thecase of K-root Nanog 2006

[20] C Contavalli W van der Gaast S Leach andE Lewis Client Subnet in DNS Queries httpstools

ietforghtmldraft-ietf-dnsop-edns-client-subnet-04

[21] httpwwwenstfr˜drossianycast

[22] Z Durumeric E Wustrow and J A HaldermanZmap Fast internet-wide scanning and its securityapplications In USENIX Security Symposium 2013

[23] B Eriksson P Barford J Sommers and R NowakA learning-based approach for IP geolocation InPAM 2010

[24] B Eriksson and M Crovella Understandinggeolocation accuracy using network geometry In ProcIEEE INFOCOM 2013

[25] X Fan J S Heidemann and R GovindanEvaluating anycast in the domain name system InProc IEEE INFOCOM 2013

[26] httpwwwict-mplaneeupublicfastping

[27] A Flavel P Mani D A Maltz N Holt J LiuY Chen and O Surmachev Fastroute A scalableload-aware anycast routing architecture for moderncdns In Proc USENIX NSDI 2015

[28] B Gueye A Ziviani M Crovella and S FdidaConstraint-based geolocation of internet hosts InACM IMC 2004

[29] T Hardie Distributing Authoritative Name Serversvia Shared Unicast Addresses IETF RFC 3258 2002

[30] J Heidemann Y Pradkin R GovindanC Papadopoulos G Bartlett and J BannisterCensus and survey of the visible internet In ACMIMC 2008

[31] Internet addresses hitlist dataset (20140829rev4338)Provided by the USCLANDER project(httpwwwisieduantlander)

[32] D Karrenberg Anycast and BGP stability A closerlook at DNSMON data Nanog 2005

[33] T Krenc O Hohlfeld and A Feldmann An internetcensus taken by an illegal botnet A qualitativeassessment of published measurements ACM CCR44(3) 2014

[34] Z Liu B Huffaker M Fomenkov N Brownlee andK C Claffy Two days in the life of the DNS anycastroot servers In PAM 2007

[35] D Madory C Cook and K Miao Who are theanycasters Nanog 2013

[36] B M Maggs and R K Sitaraman Algorithmicnuggets in content delivery SIGCOMM ComputCommun Rev 45(3) Jul 2015

[37] K Miller Deploying IP anycast Nanog 2003

[38] httpsnmaporg

[39] httpswwwopendnscomdata-center-locations

[40] C Pelsser L Cittadini S Vissicchio and R BushFrom paris to tokyo On the suitability of ping tomeasure latency In ACM IMC 2013

[41] I Poese S Uhlig M A Kaafar B Donnet andB Gueye IP geolocation databases Unreliable ACMCCR 41(2) 2011

[42] D Rossi G Pujol X Wang and F Mathieu Peekingthrough the BitTorrent seedbox hosting ecosystem InProc Traffic Monitoring and Analysis (TMA) 2014

[43] S Sarat V Pappas and A Terzis On the use ofanycast in DNS In Proc ACM SIGMETRICS 2005

[44] Y Shavitt and N Zilberman A geolocation databasesstudy IEEE J-SAC 29(10) 2011

[45] F Streibelt J Bottger N Chatzis G Smaragdakisand A Feldmann Exploring edns-client-subnetadopters in your free time In ACM IMC 2013

[46] A Su D R Choffnes A Kuzmanovic andF Bustamante Drafting behind akamai(travelocity-based detouring) In Proc ACMSIGCOMM 2006

[47] R Torres A Finamore J R Kim M Mellia M MMunafo and S Rao Dissecting video server selectionstrategies in the YouTube CDN In Proc of IEEEICDCS 2011

[48] S Zander L L Andrew and G Armitage Capturingghosts Predicting the used ipv4 space by inferringunobserved addresses In ACM IMC 2014

  • Introduction
  • Methodology Overview
    • Workflow
    • State of the art
      • System design
        • Census targets
        • Measurement dataset vs platform
        • Measurement software
        • Network protocol
        • Scalability
          • Anycast 0 censuses
            • At a glance
            • Top-100 Anycast ASes
            • Anycast Services
              • Discussion
              • References
Page 8: Characterizing IPv4 Anycast Adoption and Deployment€¦ · Characterizing IPv4 Anycast Adoption and Deployment ... CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM

stopped processing the complete Census-0 after 3 daysof CPU time where textual format additionally led toslow processing due to disk fragmentation) Moreoverdue to LFSR the order of the target IPs in all filesis not the same meaning that an on-the-fly sorting ofabout 300 lists (one per VP) containing millions targetsis needed We therefore optimized our implementationwhich currently runs in under three hours ie aboutthe same timescale of the census duration so that inprinciple we could perform a continuous analysis Whilethis is not interesting for the anycast characterizationuse-case it may become relevant for other applicationsof this technique (eg BGP hijacking inference men-tioned in Sec 5)

4 ANYCAST 0 CENSUSESThis section presents results of the first Internet-wide

anycast study We start by aggregated statistics (Sec 41)and then incrementally refine the picture by providinga birdrsquos-eye view of the most interesting deployments(Sec 42) over which we perform an additional portscancampaign to reveal their running services (Sec 43)

41 At a glanceDetails about the of our censuses are reported in Fig 10Overall 1696 IP24 belonging to 346 ASes appear tohave more than one anycast replica while we were ableto find only 897 IP24 belonging to 100 ASes havingat least 5 replicas with our technique The plot alsoshows a geographical density map of anycast replicasresults of our censuses are available for browsing at [21]offering per-deployment (as in Fig 5) or aggregated (asin Fig 10) visualizations Notice that results reportedin this paper correspond to censuses performed duringMarch 2015 with later censuses we observed small butinteresting changes in the anycast landscape While weplan to run a continuous service please be advised that(at time of writing) results at [21] refer to the censusesdescribed in this paper

Several remarks from Fig 10 are in order First no-tice that our results are conservative since (i) in regionswith low presence of PlanetLab VPs we may miss someanycast replicas eg when the BGP prefix is only lo-cally advertised (ii) the analysis technique provides alower bound on the number of replicas since overlap-ping disks may correspond to different anycast replicasbut they will not be considered in the solution of theMIS problem (recall Fig 3) Second we investigate theCAIDA AS rank list to cross check how many ASes us-ing IP-anycast figure in the top-100 results tabulatedin Fig 10 show that 19 IP24 of 8 ASes that play acentral role in the Internet belong to the list Similarlywe investigate the Alexa rank to cross check how manywebpages in the top-100k rank are hosted2 by ASes us-

2For the sake of simplicity we resolve the domain name

IP24 ASes Cities CC ReplicasAll 1696 346 77 38 13802ge 5 Replicas 897 100 71 36 11598

cap CAIDA-100 19 8 30 18 138cap Alexa-100k 242 15 45 29 4038

Figure 10 Anycast censuses results at a glance

ing IP-anycast here again we find 242 IP24 of 15ASes that are among the major players of the Web

42 Top-100 Anycast ASesAlbeit the amount of anycast IP24 may seem de-

ceiving at first in reason of its exiguous footprint it isnevertheless very rich ndash revealing silver needles in thehaystack From the very coarse cross-check of CAIDAand Alexa ranks we already expect that anycast usageis not only restricted to DNS but rather covers impor-tant ISPs and OTTs Fig 9 presents a birdrsquos-eye viewof anycast adoption depicting several information forthe 100 ASes for which we detected at least 5 replicasidentified by their WHOIS name reported in the x-axis(capped to 12 characters) Geographical and IP24 foot-print are reported in the bottom ASes are arranged leftto right in decreasing number of replicas (bottom bar-plot with standard deviation across IP24 belongingto the same AS) additionally reporting the number ofanycast IP24 for that AS (middle bar-plot) Servicefootprint is correlated to the open TCP ports in the AS(middle scatter-plot) Next the relative importance ofthe AS in the Internet and for the Web are expressedin terms of the CAIDA and Alexa ranks respectively(top scatter-plots) Finally a label reported on the topx-axis categorize the main activity of the ASes from abusiness perspective (category is informal and in case ofASes with multiple services only the most prominent isselected)

Big fishes Major players of the Internet ecosystemare easy to spot in Fig 9 The list includes not onlytier-1 and other ISPs (such as ATampT Services Tinet

of the frontpage found in Alexa to an IP and disregardcontent that is referenced in the frontpage

0

10

20

1 C

LO

UD

FL

AR

EN

ET

2 IS

C-A

SU

S3 H

UR

RIC

AN

EU

S4 C

DN

ET

WO

RK

SU

S-

5 F

AC

EB

OO

KU

S6 C

OM

MU

NIT

YD

NS

7 X

GT

LD

US

8 L

-RO

OT

US

9 M

ICR

OS

OF

TU

S10 I-R

OO

TS

E11 V

ER

ISIG

N-I

NC

12 L

LN

WU

S13 A

RY

AK

A-A

RIN

14 A

PP

LE

-EN

GIN

E15 C

ED

EX

ISU

S16 H

IGH

WIN

DS

3U

17 N

ET

NO

D-I

XS

E18 O

PE

ND

NS

US

19 W

OO

DY

NE

T-1

U20 L

GT

LD

US

21 L

IEC

HT

EN

ST

EI

22 F

AS

TL

YU

S23 C

AC

HE

NE

TW

OR

K24 IN

ST

AR

TU

S25 D

NS

CA

ST

-AS

U26 G

OO

GL

EU

S27 E

DG

EC

AS

T-I

R

28 U

MD

NE

TU

S29 D

YN

DN

SU

S30 N

SO

NE

US

31 E

AS

YL

INK

4U

S32 Y

AH

OO

-AN

2U

S33 U

LT

RA

DN

SU

S34 O

VH

FR

35 L

IEC

HT

EN

ST

EI

36 A

S-A

FIL

IAS

1

37 A

UT

OM

AT

TIC

U38 T

INE

T-B

AC

KB

O39 A

BO

VE

NE

T-C

US

40 A

MA

ZO

N-0

2U

S41 C

WG

B42 L

EV

EL

3U

S43 E

DG

EC

AS

TU

S44 T

WIT

TE

R-N

ET

W45 IN

CA

PS

UL

AU

S46 A

GT

LD

US

47 A

US

RE

GIS

TR

Y-

48 C

EN

TR

AL

NIC

-A49 C

OG

EN

T-2

149

50 H

GT

LD

US

51 H

IGH

WIN

DS

4U

52 K

-RO

OT

-SE

RV

E53 N

ET

RIP

LE

X01

54 O

MN

ITU

RE

US

55 S

OF

TL

AY

ER

US

56 W

AN

GS

U-U

SU

S57 Y

AH

OO

-FC

US

58 B

ITG

RA

VIT

YU

59 A

BIL

EN

EU

S60 A

DV

AN

-CA

ST

U61 A

SA

TTL

DSE

62 A

S-Q

UA

DR

AN

ET

63 A

S6453U

S64 A

TT

EU

65 C

EN

TR

AL

NIC

-A66 C

EN

TU

RY

LIN

K-

67 C

ON

EX

IM-A

S-A

68 E

GT

LD

US

69 K

GT

LD

US

70 M

NS

-AS

NO

71 N

ICA

TA

T72 V

ITA

L-D

NS

US

73 W

HS

-AN

YC

AS

T-

74 Z

GT

LD

US

75 IN

TE

RN

AP

-BL

K76 N

ET

AP

P-A

NY

CA

77 S

PR

INT

LIN

KU

78 A

US

RE

GIS

TR

Y-

79 C

EN

TU

RY

LIN

K-

80 D

NS

IMP

LE

US

81 D

YN

-HC

US

82 E

AS

YL

INK

2U

S83 E

DN

SC

A84 E

SG

OB

-AN

YC

AS

85 H

OM

EP

L-A

SP

L86 L

INK

ED

INU

S87 M

AS

ER

GY

US

88 M

ED

IAM

AT

H-I

N89 M

II-2

GB

90 M

II-X

PC

US

91 P

EE

R1U

S92 P

HH

-AS

DE

93 P

RE

TE

CS

CA

94 P

RO

LE

XIC

US

95 Q

UA

NT

CA

ST

US

96 R

IMB

LA

CK

BE

RR

97 S

UP

ER

NE

TW

OR

K98 U

NO

VA

-1C

A99 V

OX

ILIT

YR

O100 Z

VO

NK

OV

A-A

S

R

ep

lica

s

1

10

100

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

IP

s2

4

5380

443

8080

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00Op

en

po

rts

100

1k

10k

100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00A

lexa

ra

nk

110

1001k

10k100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

CD

ND

NS

ISP

CD

NS

ocia

l N

etw

ork

DN

SD

NS

DN

SC

lou

dD

NS

DN

SC

DN

Clo

ud

CD

NS

ecu

rity

CD

ND

NS

Secu

rity

DN

SD

NS

DN

Su

nkn

ow

nC

DN

CD

NC

DN

DN

SC

lou

dD

NS

CD

Nu

nkn

ow

nD

NS

DN

SC

lou

d m

essag

ing

Web

Po

rtal

DN

SC

lou

du

nkn

ow

nD

NS

Blo

gg

ing

ISP

-tie

r1IS

PC

lou

dIS

PIS

P-t

ier1

CD

NS

ocia

l N

etw

ork

CD

ND

NS

DN

SD

NS

ISP

DN

SC

DN

DN

SD

NS

On

lin

e M

ark

eti

ng

Clo

ud

CD

NW

eb

Po

rtal

CD

NB

ackb

on

e N

etw

ork

un

kn

ow

nD

NS

Clo

ud

ISP

-tie

r1IS

PD

NS

ISP

-tie

r1C

lou

dD

NS

DN

SV

ideo

Co

nfe

ren

cin

gD

NS

DN

SS

ecu

rity

DN

SC

lou

dW

eb

An

aly

tics

ISP

-tie

r1D

NS

ISP

DN

SD

NS

Clo

ud

messag

ing

DN

SD

NS

Clo

ud

So

cia

l N

etw

ork

Clo

ud

AD

tech

no

log

yC

DN

CD

NC

lou

dC

DN

CD

NS

ecu

rity

Web

An

aly

tics

Tele

co

m V

en

do

rC

lou

dD

NS

Clo

ud

un

kn

ow

n

Ca

ida

ra

nk

Figure 9 Birdrsquos eye view of Top-100 anycast ASes (ranked according to geographical footprint)

Sprint TATA Communications Qwest Level 3 Hur-ricane Electrics) but also a rather large spectrum ofOTTs such as CDNs (eg CloudFlare EdgeCast) host-ing (eg OVH) and cloud providers (eg MicrosoftAmazon Web Services) social networks (eg TwitterFacebook LinkedIn) and security companies that pro-vide mitigation services against DDoS attacks (eg Pro-lexic OpenDNS) The list also includes manufactur-ers (eg Apple RIM) Web registrars (eg Verisignnicat) virtual roaming and virtual meeting services(Media Network Services) blogging platforms (Automat-tic a publishing company hosting wordPresscom) cloudmessaging (EASYLINK2 owned by ATampT Services)and web analytics (OMNITURE owned by Adobe Sys-tems) Of course DNS-related service providers suchas root and top-level domain servers (eg ISCF-rootCommunityDNS) DNS service management (eg Ul-traDNS DynDNS) and public DNS resolvers (eg GoogleDNS OpenDNS) also emerge in the censusDiversity We report a breakdown of AS classes inFig 11 crisply showing that DNS now represents aboutone third of IP anycast activities Plots in Fig 9 clearly

illustrate the large diversity of anycast usage under allmetrics Indeed no correlation appear between any twometrics illustrating the degree of freedom in anycast de-ployments for instance the geographical footprint andIP24 footprints are largely unrelated (Pearson corre-lation coefficient of 035) Additionally the number ofopen ports and the specific port values vary not onlyacross deployments having an heterogeneous businessmodel (eg we observe from a minimum of 1 openport for DNS to O(104) open ports for OVH) but alsobetween deployments of the same kind (eg CloudFlareand EdgeCast CDNs have in common only port 53 80and 443 over the set of 22 open ports they are usingwith CloudFlare using 4times more ports than EdgeCast)

Geographical footprint We specifically study themean number of geographical replicas per AS (bottomplot in Fig 9) championed by the CDN CloudFlare inour measurement Overall we observe that 25 ASeshave at least 10 replicas distributed around the globeNotice that these orders of magnitude are significantlysmaller with respect to L7-anycast deployments that

0

5

10

15

20

25

30

35

DNSCDN

CloudISP

UnknownSecurity

SocialOther

AS

bre

akd

ow

n [

]

Figure 11 Breakdown of AS category (only firstcategory is considered)

can exceed O(103) for the large providers which is inpart due to the low number of vantage points in Plan-etLab (see Sec32) Among those ASes we observe10 DNS service providers (including ISC DNScast andDynDNS) and 7 major CDNs (eg CloudeFlare Lime-light Highwinds Fastly CacheNetworks Instart LogicCDNetworks) We also discover two cloud providers(eg Microsoft and Aryaka Networks) one tier-1 ISP(Hurricane Electric which has 15 of ASes in its cus-tomer cone according to CAIDA) a security company(OpenDNS also popular for its public DNS service) asocial network (Facebook) and a manufacturer (Apple)

Fig 12 further reports the cumulative number of repli-cas per IP24 depicting both results coming from thecombination of censuses as well as individual resultfrom each census alone Specifically the MIS solverorders circles by increasing radius size intuitively thesmaller the latency the lower the number of overlapsthe better the recall of our method This is confirmedin Fig 12 where censuses are combined by computingthe minimum among multiple latency measurementsbetween the same VP and target pair to get an es-timate of the RTT latency that is as close as possibleto the propagation delay Additionally combining mea-surement increases recall about 200 more IP24 arefound to be anycast in the combination with respect tothe average individual census

In this paper we limitedly consider results from thecombination but remark that results are quite con-sistent across censuses (notice that curves overlap inFig 12) A last comment is worth making about de-ployments where we observe only 2 geographically dis-tributed replicas ndash which is possibly due to the low den-sity of our VPs but could also be tied to the wronggeolocation of some VP raising false positive replicasWhile we have anecdotal evidence of some of these ex-iguous deployments being anycast we prefer to defer amore detailed analysis for future work (see Sec 5)

0 01 02 03 04 05 06 07 08 09

1

2 5 10 15 20 25

CD

F o

f n

um

ber

of

IPs2

4

Number of replicas

Combination of censuses-308VPs(130315) Census 4-240VPs(040315) Census 3-269VPs(020315) Census 2-255VPs(270215) Census 1-261VPs

Figure 12 CDF of geographically distinct repli-cas per IP24 (individual censuses and overall)

0 01 02 03 04 05 06 07 08 09

1

1 10 100 1000

CLOUDFLARENETUS (328)

GOOGLEUS (102)

EDGECASTUS (37)

PROLEXICUS (21)

LINKEDINUS (1)

APPLEUS (6)

TWITTERUS (3)

LEVEL3US (2)C

DF

of

nu

mb

er

of

AS

es

Number of IPs24

gt= 5 Replicas

Figure 13 Number of IPs24 per AS

IP24 footprint In terms of the number of anycastIPs24 per AS (middle plot in Fig 9) we find thatthe CDN CloudFlare is by far the largest in terms ofIP address ranges Overall we find 10 ASes that haveat least 10 anycast IPs24 3 are CDNs (CloudFlareEdgeCast BitGravity) 3 are DNS providers (DNScastWoodyNet UltraDNS) and the remaining ASes rep-resent multiple services (Automattic Google AmazonWeb Services and Prolexic) The distribution of thenumber of IPs24 per AS depicted in Fig 13 shows thatabout half have exactly one IP24 (eg LinkedIn andATampT Services) Yet about 10 of the ASes employat least 10 subnets for instance Prolexic EdgeCastGoogle and CloudFlare employ 21 37 102 and 328anycast IP24 respectively

While in this work we do not provide a systematicinvestigation of the deployment density (ie how manyIP32 are alive in each IP24) from the above discus-sion about diversity is not surprising that we were ableto identify both very sparse (eg Google 8888 is theonly address alive in the 888024) and very dense de-ployments (eg well over 99 of IPs are alive in mostCloudFlare subnets)

Importance The presence of ASes ranking amongthe top-100 in the CAIDA list as well as CDNs servingcontent in the top-100k Alexa list are good indicatorsthat anycast is used for popular and important servicesConsidering CDNs that are after DNS the most popu-lar anycast service according to Fig 11 we observe that8 CDNs serve Alexa-100k websites this set includesCloudFlare EdgeCast and Fastly with 188 10 and5 websites respectively (in addition Highwinds Cach-eNetworks Instart Incapsula and BitGravity host onepopular site each) In addition 11 of the websites listedby Alexa are hosted by Google anycast IPs Finally10 websites are hosted on IPs that belong to Prolexic(now part of Akamai) which operates a DDoS mitiga-tion service that receives the traffic on behalf of its clientnetworks redirecting only legitimate traffic to them

43 Anycast ServicesPortscan campaign In reason of the historical use ofanycast for DNS services we believe it to be importantto provide an up-to-date longitudinal view across ser-vices offered via IP-anycast especially focusing on TCPWe provide a summary of the nmap probing in the topof Fig 14 We test all anycast 24 of the top-100 ASespicking a single IP representative per 24 we scan atlow rates all 216 TCP ports Our results are conser-vative in that different IPs in the same 24 may havedifferent open ports (which happens eg for Cloud-Flare and EdgeCast) and since an under-estimation ofthe number of open TCP ports can also be the result ofprobe filtering by firewalls and routers along the pathto the targets Out of the 897 IP of the top-100 ASeswe find that 816 of 81 ASes have at least one open TCPport The total number of distinct open TCP portsacross is 10485 providing 449 well-known services (ieas indicated by TCP port classification) 170 of whichover SSL Additionally nmap fingerprinting discovers30 different software implementations running on theanycast replicas that we also detail next

Class imbalance Given the heterogeneity of the IP24footprint we argue being necessary to consider only per-AS statistics to avoid presenting results that are biaseddue to class imbalance We illustrate the problem bydepicting in Fig 14 the frequency count of the top-10open TCP ports by number of ASes (top) and IPs24(bottom) Notice that only port 80 443 and 53 appearto be common to both top and bottom plots especiallyall ports in the hatched area are due to the large pre-dominance of IP24 owned by the CloudFlare AS whichalso affect the order of common ports in the top-10 Wethus focus on per AS statistics in the following

Stateful services Fig 15 presents the complemen-tary CDF of the number of open TCP ports per AS

nmap portscan statisticsIPs32 ASes Ports (SSL) Well Software

known812 81 10499 (185) 457 30

0

200

400

80

443

8080

53

2052

2053

2082

2083

8443

2087IP

24 f

req

uen

cy

IP2

4 f

req

uen

cy

0

200

400

htt

p

htt

p s

sl

htt

p

do

main

htt

p

htt

p

htt

p

htt

p

htt

p

htt

p

0

20

40

60

53

80

443

179

22

8080

8083

3306

1935

5252A

S f

req

uen

cy

AS

fre

qu

en

cy

0

20

40

60

do

main

htt

p

htt

p-s

sl

tcp

wra

pp

ed

ssh

htt

p

tcp

wra

pp

ed

tcp

wra

pp

ed

sip

mo

vaz-s

sc

Figure 14 Overall nmap portscan statistics andTop-10 open TCP ports (per AS and per 24)

We make the following observations (i) roughly half ofthe ASes have at least one open TCP port (ii) about10 of the ASes have at least 5 open TCP ports and(iii) the largest service footprint is represented by Incap-sula and especially OVH with 313 and 10148 open portsrespectively In the latter case while we did not inves-tigate thoroughly we suspect the large number of portsbeing due to the fact that OVH the largest hosting ser-vice in Europe and the 3rd in the world is significantlypopular in the BitTorrent seedbox ecosystem [42] Pre-dominant services (beyond DNS) include fairly popu-lar HTTP and HTTPS used by over 20 of the ASesEven excluding the OVH case the list of interestingservices is large In terms of business diversity 22 ASeshave at least 4 different TCP ports open 8 CDNs 4DNS 4 ISPs including a tier-1 ISP (Tinet SpA) andGoogle with 9 open TCP ports Finally interesting(though unpopular) services worth listing include multi-media services (RTMP Simplify Media MythTV) andgaming (Minecraft)

Software diversity Fig 16 lists 30 different soft-ware that we group into three main categories WebMail and DNS Interestingly the list includes opensource software such as popular web and DNS daemons(eg nginx ISC BIND) and proprietary software (egECAccECSECD which are web servers developed byEdgeCast) Starting with DNS software notice that for44 ASes using port 53 (out of 67) nmap could not iden-tify the software version running on the remote serverUnsurprisingly we find that ISC BIND is by far themost adopted protocol to handle DNS requests overanycast Yet we also detect the use by 3 ASes (AppleK-ROOT L-ROOT) of the NLnet Labs NSD implemen-

001

01

1

1 10 100 1000 10000 100000

OVHFR (10148)

INCAPSULAUS (330)

CLOUDFLARENETUS (20)

GOOGLEUS (9)

EDGECASTUS (5)C

CD

F

Number of open TCP ports

ASes

Figure 15 Complementary CDF of the numberof open TCP ports per AS

tation which is specifically designed to add resilienceagainst software failures of DNS root servers

Among web servers the most popular are nginx (7ASes) Apache httpd and lighttpd (ex aeligquo with 4ASes) We observe the use of proprietary web serversby some CDNs (eg cloudflare-nginx and Panel httpd)Though our dataset has a limited size we attempt acomparison with the relative popularity of webserversin the unicast world the Spearman correlation of popu-larity rank in our dataset with webserver ranks [2] in theAlexa-10M is low (038) As for the DNS case differencemay arise in some peculiar features that are especiallyvaluable in the anycast context Finally we detect thepresence of running daemons that serve mail on anycastIPs from Google (Gmail imapd Gmail pop3d gsmtp)as well as of RPC (ssh MicrosoftRPC) and databases(MySQLMicrosoft SQL)

5 DISCUSSIONWe present the first census(es) of IPv4 anycast de-

ployment gathered through an original and robust tech-nique implemented with an efficient and scalable systemdesign In spirit with the open source and data move-ment results of our census are available at [21]

Our characterization show that a tiny fraction of theIPv4 space is anycasted yet among the anycasters werecognize major players of the Internet ecosystem in-cluding top-ranking ISPs popular Cloud OTT and es-pecially CDN operators We additionally show greatheterogeneity along multiple directions and especiallyin terms of the offered services Particularly our portscancampaign of anycasted subnets reveals over 450 well-known services from over 10 000 unique open portsAdditionally we uncover 30 software implementationswith a relative breakdown that differs from softwareranking in the unicast IP world

Yet this work only scratches the surface and opensmore questions than it is able to answer as for instance

0

10

20

ISC

BIN

DN

Ln

et

Lab

s N

SD

Mic

roso

ft D

NS

Op

en

DN

S

ng

inx

lig

htt

pd

Ap

ach

e h

ttp

dE

CD

Mic

roso

ft H

TT

P

Mic

roso

ft IIS

Varn

ish

Ap

ach

e T

om

cat

bit

asic

v2

CF

S 0

213

clo

ud

flare

-ng

inx

cP

an

el h

ttp

dth

ttp

dE

CA

ccE

CS

Go

og

le h

ttp

din

sta

rt1

60

Gm

ail im

ap

dG

mail p

op

3d

Go

og

le g

sm

tp

Op

en

SS

HM

yS

QL

ssls

trip

Mic

roso

ft S

QL

M

icro

so

ft R

PC

DNS Web Mail Other

AS

fre

qu

en

cy

Figure 16 Breakdown of software running onanycast replicas

IP-level CDN Refining the active methodology bymapping content object (and not only frontpage) fromthe Alexa-100k would be needed to gather a better un-derstanding of IP-level CDNLongitudinal view Taking periodic censuses and an-alyzing the time evolution over longer timescales wouldallow to track evolution of IP anycast deploymentsTraffic volume A missing information concerns thetraffic volume served by IP anycast that can be gath-ered via passive measurement and annotated with re-sults of our census (ie binary flag per anycast IP24)Combine measurement platforms As we have seenit would be interesting to exploit multiple platforms inaddition to PlanetLab such as RIPE Atlas this wouldboth lead to a better characterization of large deploy-ments (eg increase the recall) as well as possibly assistin confirmingdiscarding suspicious deployments (iethose for which we detected 2 replicas from PL)BGP hijacking Detecting geo-inconsistencies for know-ingly unicast prefixes is symptomatic of BGP hijack-ing attacks being able to periodically and quickly scanthe network to raise alarms and cross-check them withother types of data (eg BGP feeds traceroute mea-surements) is a relevant extension of this work

AcknowledgementsWe thank our shepherd Matteo Varvello and the anony-mous CoNEXT reviewers for their valuable feedbackThis work has been carried out at LINCS (httpwwwlincsfr) The research leading to these results was sup-ported by the European Union under the FP7 GrantAgreement n 318627 (Integrated Project rdquomPlanerdquo)and by a Google Faculty Research Award

6 REFERENCES[1] httpinternetcensus2012bitbucketorgpaperhtml

[2] httpw3techscomtechnologiesoverviewweb serverall

[3] J Abley A software approach to distributing requestsfor DNS service using GNU Zebra ISC BIND 9FreeBSD In Proc USENIX ATEC 2004

[4] J Abley and K Lindqvist Operation of AnycastServices RFC 4786 (Best Current Practice) 2006

[5] V K Adhikari S Jain and Z li Zhang Youtubetraffic dynamics and its interplay with a tier-1 isp Anisp perspective In ACM IMC 2010

[6] httpwwwcaidaorgprojectsark

[7] httpwwwroot-serversorg

[8] F Baker Requirements for IP Version 4 RoutersIETF RFC 1812 1995

[9] H Ballani and P Francis Towards a global IP anycastservice In Proc ACM SIGCOMM 2005

[10] H Ballani P Francis and S Ratnasamy Ameasurement-based deployment proposal for IPanycast In ACM IMC 2006

[11] B Barber M Larson and M Kosters Traffic sourceanalysis of the J root anycast instances Nanog 2006

[12] I Bermudez S Traverso M Mellia and M MunafoExploring the cloud from passive measurements TheAmazon AWS case In Proc IEEE INFOCOM 2013

[13] P Boothe and R Bush DNS Anycast Stability SomeEarly Results CAIDA 2005

[14] R Braden Requirements for Internet Hosts -Communication Layers IETF RFC 1122 1989

[15] M Calder X Fan Z Hu E Katz-BassettJ Heidemann and R Govindan Mapping theexpansion of googlersquos serving infrastructure In ACMIMC 2013

[16] M Calder A Flavel E Katz-Bassett R Mahajanand J Padhye Analyzing the performance of ananycast cdn In ACM IMC 2015

[17] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman A fistful of pings Accurateand lightweight anycast enumeration and geolocationIn Proc IEEE INFOCOM 2015

[18] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman Latency-based anycastgeolocalization Algorithms software and datasets InTech Rep 2015

[19] L Colitti Measuring anycast server performance Thecase of K-root Nanog 2006

[20] C Contavalli W van der Gaast S Leach andE Lewis Client Subnet in DNS Queries httpstools

ietforghtmldraft-ietf-dnsop-edns-client-subnet-04

[21] httpwwwenstfr˜drossianycast

[22] Z Durumeric E Wustrow and J A HaldermanZmap Fast internet-wide scanning and its securityapplications In USENIX Security Symposium 2013

[23] B Eriksson P Barford J Sommers and R NowakA learning-based approach for IP geolocation InPAM 2010

[24] B Eriksson and M Crovella Understandinggeolocation accuracy using network geometry In ProcIEEE INFOCOM 2013

[25] X Fan J S Heidemann and R GovindanEvaluating anycast in the domain name system InProc IEEE INFOCOM 2013

[26] httpwwwict-mplaneeupublicfastping

[27] A Flavel P Mani D A Maltz N Holt J LiuY Chen and O Surmachev Fastroute A scalableload-aware anycast routing architecture for moderncdns In Proc USENIX NSDI 2015

[28] B Gueye A Ziviani M Crovella and S FdidaConstraint-based geolocation of internet hosts InACM IMC 2004

[29] T Hardie Distributing Authoritative Name Serversvia Shared Unicast Addresses IETF RFC 3258 2002

[30] J Heidemann Y Pradkin R GovindanC Papadopoulos G Bartlett and J BannisterCensus and survey of the visible internet In ACMIMC 2008

[31] Internet addresses hitlist dataset (20140829rev4338)Provided by the USCLANDER project(httpwwwisieduantlander)

[32] D Karrenberg Anycast and BGP stability A closerlook at DNSMON data Nanog 2005

[33] T Krenc O Hohlfeld and A Feldmann An internetcensus taken by an illegal botnet A qualitativeassessment of published measurements ACM CCR44(3) 2014

[34] Z Liu B Huffaker M Fomenkov N Brownlee andK C Claffy Two days in the life of the DNS anycastroot servers In PAM 2007

[35] D Madory C Cook and K Miao Who are theanycasters Nanog 2013

[36] B M Maggs and R K Sitaraman Algorithmicnuggets in content delivery SIGCOMM ComputCommun Rev 45(3) Jul 2015

[37] K Miller Deploying IP anycast Nanog 2003

[38] httpsnmaporg

[39] httpswwwopendnscomdata-center-locations

[40] C Pelsser L Cittadini S Vissicchio and R BushFrom paris to tokyo On the suitability of ping tomeasure latency In ACM IMC 2013

[41] I Poese S Uhlig M A Kaafar B Donnet andB Gueye IP geolocation databases Unreliable ACMCCR 41(2) 2011

[42] D Rossi G Pujol X Wang and F Mathieu Peekingthrough the BitTorrent seedbox hosting ecosystem InProc Traffic Monitoring and Analysis (TMA) 2014

[43] S Sarat V Pappas and A Terzis On the use ofanycast in DNS In Proc ACM SIGMETRICS 2005

[44] Y Shavitt and N Zilberman A geolocation databasesstudy IEEE J-SAC 29(10) 2011

[45] F Streibelt J Bottger N Chatzis G Smaragdakisand A Feldmann Exploring edns-client-subnetadopters in your free time In ACM IMC 2013

[46] A Su D R Choffnes A Kuzmanovic andF Bustamante Drafting behind akamai(travelocity-based detouring) In Proc ACMSIGCOMM 2006

[47] R Torres A Finamore J R Kim M Mellia M MMunafo and S Rao Dissecting video server selectionstrategies in the YouTube CDN In Proc of IEEEICDCS 2011

[48] S Zander L L Andrew and G Armitage Capturingghosts Predicting the used ipv4 space by inferringunobserved addresses In ACM IMC 2014

  • Introduction
  • Methodology Overview
    • Workflow
    • State of the art
      • System design
        • Census targets
        • Measurement dataset vs platform
        • Measurement software
        • Network protocol
        • Scalability
          • Anycast 0 censuses
            • At a glance
            • Top-100 Anycast ASes
            • Anycast Services
              • Discussion
              • References
Page 9: Characterizing IPv4 Anycast Adoption and Deployment€¦ · Characterizing IPv4 Anycast Adoption and Deployment ... CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM

0

10

20

1 C

LO

UD

FL

AR

EN

ET

2 IS

C-A

SU

S3 H

UR

RIC

AN

EU

S4 C

DN

ET

WO

RK

SU

S-

5 F

AC

EB

OO

KU

S6 C

OM

MU

NIT

YD

NS

7 X

GT

LD

US

8 L

-RO

OT

US

9 M

ICR

OS

OF

TU

S10 I-R

OO

TS

E11 V

ER

ISIG

N-I

NC

12 L

LN

WU

S13 A

RY

AK

A-A

RIN

14 A

PP

LE

-EN

GIN

E15 C

ED

EX

ISU

S16 H

IGH

WIN

DS

3U

17 N

ET

NO

D-I

XS

E18 O

PE

ND

NS

US

19 W

OO

DY

NE

T-1

U20 L

GT

LD

US

21 L

IEC

HT

EN

ST

EI

22 F

AS

TL

YU

S23 C

AC

HE

NE

TW

OR

K24 IN

ST

AR

TU

S25 D

NS

CA

ST

-AS

U26 G

OO

GL

EU

S27 E

DG

EC

AS

T-I

R

28 U

MD

NE

TU

S29 D

YN

DN

SU

S30 N

SO

NE

US

31 E

AS

YL

INK

4U

S32 Y

AH

OO

-AN

2U

S33 U

LT

RA

DN

SU

S34 O

VH

FR

35 L

IEC

HT

EN

ST

EI

36 A

S-A

FIL

IAS

1

37 A

UT

OM

AT

TIC

U38 T

INE

T-B

AC

KB

O39 A

BO

VE

NE

T-C

US

40 A

MA

ZO

N-0

2U

S41 C

WG

B42 L

EV

EL

3U

S43 E

DG

EC

AS

TU

S44 T

WIT

TE

R-N

ET

W45 IN

CA

PS

UL

AU

S46 A

GT

LD

US

47 A

US

RE

GIS

TR

Y-

48 C

EN

TR

AL

NIC

-A49 C

OG

EN

T-2

149

50 H

GT

LD

US

51 H

IGH

WIN

DS

4U

52 K

-RO

OT

-SE

RV

E53 N

ET

RIP

LE

X01

54 O

MN

ITU

RE

US

55 S

OF

TL

AY

ER

US

56 W

AN

GS

U-U

SU

S57 Y

AH

OO

-FC

US

58 B

ITG

RA

VIT

YU

59 A

BIL

EN

EU

S60 A

DV

AN

-CA

ST

U61 A

SA

TTL

DSE

62 A

S-Q

UA

DR

AN

ET

63 A

S6453U

S64 A

TT

EU

65 C

EN

TR

AL

NIC

-A66 C

EN

TU

RY

LIN

K-

67 C

ON

EX

IM-A

S-A

68 E

GT

LD

US

69 K

GT

LD

US

70 M

NS

-AS

NO

71 N

ICA

TA

T72 V

ITA

L-D

NS

US

73 W

HS

-AN

YC

AS

T-

74 Z

GT

LD

US

75 IN

TE

RN

AP

-BL

K76 N

ET

AP

P-A

NY

CA

77 S

PR

INT

LIN

KU

78 A

US

RE

GIS

TR

Y-

79 C

EN

TU

RY

LIN

K-

80 D

NS

IMP

LE

US

81 D

YN

-HC

US

82 E

AS

YL

INK

2U

S83 E

DN

SC

A84 E

SG

OB

-AN

YC

AS

85 H

OM

EP

L-A

SP

L86 L

INK

ED

INU

S87 M

AS

ER

GY

US

88 M

ED

IAM

AT

H-I

N89 M

II-2

GB

90 M

II-X

PC

US

91 P

EE

R1U

S92 P

HH

-AS

DE

93 P

RE

TE

CS

CA

94 P

RO

LE

XIC

US

95 Q

UA

NT

CA

ST

US

96 R

IMB

LA

CK

BE

RR

97 S

UP

ER

NE

TW

OR

K98 U

NO

VA

-1C

A99 V

OX

ILIT

YR

O100 Z

VO

NK

OV

A-A

S

R

ep

lica

s

1

10

100

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

IP

s2

4

5380

443

8080

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00Op

en

po

rts

100

1k

10k

100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00A

lexa

ra

nk

110

1001k

10k100k

1 2 3 4 5 6 7 8 9 1

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 2

0 2

1 2

2 2

3 2

4 2

5 2

6 2

7 2

8 2

9 3

0 3

1 3

2 3

3 3

4 3

5 3

6 3

7 3

8 3

9 4

0 4

1 4

2 4

3 4

4 4

5 4

6 4

7 4

8 4

9 5

0 5

1 5

2 5

3 5

4 5

5 5

6 5

7 5

8 5

9 6

0 6

1 6

2 6

3 6

4 6

5 6

6 6

7 6

8 6

9 7

0 7

1 7

2 7

3 7

4 7

5 7

6 7

7 7

8 7

9 8

0 8

1 8

2 8

3 8

4 8

5 8

6 8

7 8

8 8

9 9

0 9

1 9

2 9

3 9

4 9

5 9

6 9

7 9

8 9

9 1

00

CD

ND

NS

ISP

CD

NS

ocia

l N

etw

ork

DN

SD

NS

DN

SC

lou

dD

NS

DN

SC

DN

Clo

ud

CD

NS

ecu

rity

CD

ND

NS

Secu

rity

DN

SD

NS

DN

Su

nkn

ow

nC

DN

CD

NC

DN

DN

SC

lou

dD

NS

CD

Nu

nkn

ow

nD

NS

DN

SC

lou

d m

essag

ing

Web

Po

rtal

DN

SC

lou

du

nkn

ow

nD

NS

Blo

gg

ing

ISP

-tie

r1IS

PC

lou

dIS

PIS

P-t

ier1

CD

NS

ocia

l N

etw

ork

CD

ND

NS

DN

SD

NS

ISP

DN

SC

DN

DN

SD

NS

On

lin

e M

ark

eti

ng

Clo

ud

CD

NW

eb

Po

rtal

CD

NB

ackb

on

e N

etw

ork

un

kn

ow

nD

NS

Clo

ud

ISP

-tie

r1IS

PD

NS

ISP

-tie

r1C

lou

dD

NS

DN

SV

ideo

Co

nfe

ren

cin

gD

NS

DN

SS

ecu

rity

DN

SC

lou

dW

eb

An

aly

tics

ISP

-tie

r1D

NS

ISP

DN

SD

NS

Clo

ud

messag

ing

DN

SD

NS

Clo

ud

So

cia

l N

etw

ork

Clo

ud

AD

tech

no

log

yC

DN

CD

NC

lou

dC

DN

CD

NS

ecu

rity

Web

An

aly

tics

Tele

co

m V

en

do

rC

lou

dD

NS

Clo

ud

un

kn

ow

n

Ca

ida

ra

nk

Figure 9 Birdrsquos eye view of Top-100 anycast ASes (ranked according to geographical footprint)

Sprint TATA Communications Qwest Level 3 Hur-ricane Electrics) but also a rather large spectrum ofOTTs such as CDNs (eg CloudFlare EdgeCast) host-ing (eg OVH) and cloud providers (eg MicrosoftAmazon Web Services) social networks (eg TwitterFacebook LinkedIn) and security companies that pro-vide mitigation services against DDoS attacks (eg Pro-lexic OpenDNS) The list also includes manufactur-ers (eg Apple RIM) Web registrars (eg Verisignnicat) virtual roaming and virtual meeting services(Media Network Services) blogging platforms (Automat-tic a publishing company hosting wordPresscom) cloudmessaging (EASYLINK2 owned by ATampT Services)and web analytics (OMNITURE owned by Adobe Sys-tems) Of course DNS-related service providers suchas root and top-level domain servers (eg ISCF-rootCommunityDNS) DNS service management (eg Ul-traDNS DynDNS) and public DNS resolvers (eg GoogleDNS OpenDNS) also emerge in the censusDiversity We report a breakdown of AS classes inFig 11 crisply showing that DNS now represents aboutone third of IP anycast activities Plots in Fig 9 clearly

illustrate the large diversity of anycast usage under allmetrics Indeed no correlation appear between any twometrics illustrating the degree of freedom in anycast de-ployments for instance the geographical footprint andIP24 footprints are largely unrelated (Pearson corre-lation coefficient of 035) Additionally the number ofopen ports and the specific port values vary not onlyacross deployments having an heterogeneous businessmodel (eg we observe from a minimum of 1 openport for DNS to O(104) open ports for OVH) but alsobetween deployments of the same kind (eg CloudFlareand EdgeCast CDNs have in common only port 53 80and 443 over the set of 22 open ports they are usingwith CloudFlare using 4times more ports than EdgeCast)

Geographical footprint We specifically study themean number of geographical replicas per AS (bottomplot in Fig 9) championed by the CDN CloudFlare inour measurement Overall we observe that 25 ASeshave at least 10 replicas distributed around the globeNotice that these orders of magnitude are significantlysmaller with respect to L7-anycast deployments that

0

5

10

15

20

25

30

35

DNSCDN

CloudISP

UnknownSecurity

SocialOther

AS

bre

akd

ow

n [

]

Figure 11 Breakdown of AS category (only firstcategory is considered)

can exceed O(103) for the large providers which is inpart due to the low number of vantage points in Plan-etLab (see Sec32) Among those ASes we observe10 DNS service providers (including ISC DNScast andDynDNS) and 7 major CDNs (eg CloudeFlare Lime-light Highwinds Fastly CacheNetworks Instart LogicCDNetworks) We also discover two cloud providers(eg Microsoft and Aryaka Networks) one tier-1 ISP(Hurricane Electric which has 15 of ASes in its cus-tomer cone according to CAIDA) a security company(OpenDNS also popular for its public DNS service) asocial network (Facebook) and a manufacturer (Apple)

Fig 12 further reports the cumulative number of repli-cas per IP24 depicting both results coming from thecombination of censuses as well as individual resultfrom each census alone Specifically the MIS solverorders circles by increasing radius size intuitively thesmaller the latency the lower the number of overlapsthe better the recall of our method This is confirmedin Fig 12 where censuses are combined by computingthe minimum among multiple latency measurementsbetween the same VP and target pair to get an es-timate of the RTT latency that is as close as possibleto the propagation delay Additionally combining mea-surement increases recall about 200 more IP24 arefound to be anycast in the combination with respect tothe average individual census

In this paper we limitedly consider results from thecombination but remark that results are quite con-sistent across censuses (notice that curves overlap inFig 12) A last comment is worth making about de-ployments where we observe only 2 geographically dis-tributed replicas ndash which is possibly due to the low den-sity of our VPs but could also be tied to the wronggeolocation of some VP raising false positive replicasWhile we have anecdotal evidence of some of these ex-iguous deployments being anycast we prefer to defer amore detailed analysis for future work (see Sec 5)

0 01 02 03 04 05 06 07 08 09

1

2 5 10 15 20 25

CD

F o

f n

um

ber

of

IPs2

4

Number of replicas

Combination of censuses-308VPs(130315) Census 4-240VPs(040315) Census 3-269VPs(020315) Census 2-255VPs(270215) Census 1-261VPs

Figure 12 CDF of geographically distinct repli-cas per IP24 (individual censuses and overall)

0 01 02 03 04 05 06 07 08 09

1

1 10 100 1000

CLOUDFLARENETUS (328)

GOOGLEUS (102)

EDGECASTUS (37)

PROLEXICUS (21)

LINKEDINUS (1)

APPLEUS (6)

TWITTERUS (3)

LEVEL3US (2)C

DF

of

nu

mb

er

of

AS

es

Number of IPs24

gt= 5 Replicas

Figure 13 Number of IPs24 per AS

IP24 footprint In terms of the number of anycastIPs24 per AS (middle plot in Fig 9) we find thatthe CDN CloudFlare is by far the largest in terms ofIP address ranges Overall we find 10 ASes that haveat least 10 anycast IPs24 3 are CDNs (CloudFlareEdgeCast BitGravity) 3 are DNS providers (DNScastWoodyNet UltraDNS) and the remaining ASes rep-resent multiple services (Automattic Google AmazonWeb Services and Prolexic) The distribution of thenumber of IPs24 per AS depicted in Fig 13 shows thatabout half have exactly one IP24 (eg LinkedIn andATampT Services) Yet about 10 of the ASes employat least 10 subnets for instance Prolexic EdgeCastGoogle and CloudFlare employ 21 37 102 and 328anycast IP24 respectively

While in this work we do not provide a systematicinvestigation of the deployment density (ie how manyIP32 are alive in each IP24) from the above discus-sion about diversity is not surprising that we were ableto identify both very sparse (eg Google 8888 is theonly address alive in the 888024) and very dense de-ployments (eg well over 99 of IPs are alive in mostCloudFlare subnets)

Importance The presence of ASes ranking amongthe top-100 in the CAIDA list as well as CDNs servingcontent in the top-100k Alexa list are good indicatorsthat anycast is used for popular and important servicesConsidering CDNs that are after DNS the most popu-lar anycast service according to Fig 11 we observe that8 CDNs serve Alexa-100k websites this set includesCloudFlare EdgeCast and Fastly with 188 10 and5 websites respectively (in addition Highwinds Cach-eNetworks Instart Incapsula and BitGravity host onepopular site each) In addition 11 of the websites listedby Alexa are hosted by Google anycast IPs Finally10 websites are hosted on IPs that belong to Prolexic(now part of Akamai) which operates a DDoS mitiga-tion service that receives the traffic on behalf of its clientnetworks redirecting only legitimate traffic to them

43 Anycast ServicesPortscan campaign In reason of the historical use ofanycast for DNS services we believe it to be importantto provide an up-to-date longitudinal view across ser-vices offered via IP-anycast especially focusing on TCPWe provide a summary of the nmap probing in the topof Fig 14 We test all anycast 24 of the top-100 ASespicking a single IP representative per 24 we scan atlow rates all 216 TCP ports Our results are conser-vative in that different IPs in the same 24 may havedifferent open ports (which happens eg for Cloud-Flare and EdgeCast) and since an under-estimation ofthe number of open TCP ports can also be the result ofprobe filtering by firewalls and routers along the pathto the targets Out of the 897 IP of the top-100 ASeswe find that 816 of 81 ASes have at least one open TCPport The total number of distinct open TCP portsacross is 10485 providing 449 well-known services (ieas indicated by TCP port classification) 170 of whichover SSL Additionally nmap fingerprinting discovers30 different software implementations running on theanycast replicas that we also detail next

Class imbalance Given the heterogeneity of the IP24footprint we argue being necessary to consider only per-AS statistics to avoid presenting results that are biaseddue to class imbalance We illustrate the problem bydepicting in Fig 14 the frequency count of the top-10open TCP ports by number of ASes (top) and IPs24(bottom) Notice that only port 80 443 and 53 appearto be common to both top and bottom plots especiallyall ports in the hatched area are due to the large pre-dominance of IP24 owned by the CloudFlare AS whichalso affect the order of common ports in the top-10 Wethus focus on per AS statistics in the following

Stateful services Fig 15 presents the complemen-tary CDF of the number of open TCP ports per AS

nmap portscan statisticsIPs32 ASes Ports (SSL) Well Software

known812 81 10499 (185) 457 30

0

200

400

80

443

8080

53

2052

2053

2082

2083

8443

2087IP

24 f

req

uen

cy

IP2

4 f

req

uen

cy

0

200

400

htt

p

htt

p s

sl

htt

p

do

main

htt

p

htt

p

htt

p

htt

p

htt

p

htt

p

0

20

40

60

53

80

443

179

22

8080

8083

3306

1935

5252A

S f

req

uen

cy

AS

fre

qu

en

cy

0

20

40

60

do

main

htt

p

htt

p-s

sl

tcp

wra

pp

ed

ssh

htt

p

tcp

wra

pp

ed

tcp

wra

pp

ed

sip

mo

vaz-s

sc

Figure 14 Overall nmap portscan statistics andTop-10 open TCP ports (per AS and per 24)

We make the following observations (i) roughly half ofthe ASes have at least one open TCP port (ii) about10 of the ASes have at least 5 open TCP ports and(iii) the largest service footprint is represented by Incap-sula and especially OVH with 313 and 10148 open portsrespectively In the latter case while we did not inves-tigate thoroughly we suspect the large number of portsbeing due to the fact that OVH the largest hosting ser-vice in Europe and the 3rd in the world is significantlypopular in the BitTorrent seedbox ecosystem [42] Pre-dominant services (beyond DNS) include fairly popu-lar HTTP and HTTPS used by over 20 of the ASesEven excluding the OVH case the list of interestingservices is large In terms of business diversity 22 ASeshave at least 4 different TCP ports open 8 CDNs 4DNS 4 ISPs including a tier-1 ISP (Tinet SpA) andGoogle with 9 open TCP ports Finally interesting(though unpopular) services worth listing include multi-media services (RTMP Simplify Media MythTV) andgaming (Minecraft)

Software diversity Fig 16 lists 30 different soft-ware that we group into three main categories WebMail and DNS Interestingly the list includes opensource software such as popular web and DNS daemons(eg nginx ISC BIND) and proprietary software (egECAccECSECD which are web servers developed byEdgeCast) Starting with DNS software notice that for44 ASes using port 53 (out of 67) nmap could not iden-tify the software version running on the remote serverUnsurprisingly we find that ISC BIND is by far themost adopted protocol to handle DNS requests overanycast Yet we also detect the use by 3 ASes (AppleK-ROOT L-ROOT) of the NLnet Labs NSD implemen-

001

01

1

1 10 100 1000 10000 100000

OVHFR (10148)

INCAPSULAUS (330)

CLOUDFLARENETUS (20)

GOOGLEUS (9)

EDGECASTUS (5)C

CD

F

Number of open TCP ports

ASes

Figure 15 Complementary CDF of the numberof open TCP ports per AS

tation which is specifically designed to add resilienceagainst software failures of DNS root servers

Among web servers the most popular are nginx (7ASes) Apache httpd and lighttpd (ex aeligquo with 4ASes) We observe the use of proprietary web serversby some CDNs (eg cloudflare-nginx and Panel httpd)Though our dataset has a limited size we attempt acomparison with the relative popularity of webserversin the unicast world the Spearman correlation of popu-larity rank in our dataset with webserver ranks [2] in theAlexa-10M is low (038) As for the DNS case differencemay arise in some peculiar features that are especiallyvaluable in the anycast context Finally we detect thepresence of running daemons that serve mail on anycastIPs from Google (Gmail imapd Gmail pop3d gsmtp)as well as of RPC (ssh MicrosoftRPC) and databases(MySQLMicrosoft SQL)

5 DISCUSSIONWe present the first census(es) of IPv4 anycast de-

ployment gathered through an original and robust tech-nique implemented with an efficient and scalable systemdesign In spirit with the open source and data move-ment results of our census are available at [21]

Our characterization show that a tiny fraction of theIPv4 space is anycasted yet among the anycasters werecognize major players of the Internet ecosystem in-cluding top-ranking ISPs popular Cloud OTT and es-pecially CDN operators We additionally show greatheterogeneity along multiple directions and especiallyin terms of the offered services Particularly our portscancampaign of anycasted subnets reveals over 450 well-known services from over 10 000 unique open portsAdditionally we uncover 30 software implementationswith a relative breakdown that differs from softwareranking in the unicast IP world

Yet this work only scratches the surface and opensmore questions than it is able to answer as for instance

0

10

20

ISC

BIN

DN

Ln

et

Lab

s N

SD

Mic

roso

ft D

NS

Op

en

DN

S

ng

inx

lig

htt

pd

Ap

ach

e h

ttp

dE

CD

Mic

roso

ft H

TT

P

Mic

roso

ft IIS

Varn

ish

Ap

ach

e T

om

cat

bit

asic

v2

CF

S 0

213

clo

ud

flare

-ng

inx

cP

an

el h

ttp

dth

ttp

dE

CA

ccE

CS

Go

og

le h

ttp

din

sta

rt1

60

Gm

ail im

ap

dG

mail p

op

3d

Go

og

le g

sm

tp

Op

en

SS

HM

yS

QL

ssls

trip

Mic

roso

ft S

QL

M

icro

so

ft R

PC

DNS Web Mail Other

AS

fre

qu

en

cy

Figure 16 Breakdown of software running onanycast replicas

IP-level CDN Refining the active methodology bymapping content object (and not only frontpage) fromthe Alexa-100k would be needed to gather a better un-derstanding of IP-level CDNLongitudinal view Taking periodic censuses and an-alyzing the time evolution over longer timescales wouldallow to track evolution of IP anycast deploymentsTraffic volume A missing information concerns thetraffic volume served by IP anycast that can be gath-ered via passive measurement and annotated with re-sults of our census (ie binary flag per anycast IP24)Combine measurement platforms As we have seenit would be interesting to exploit multiple platforms inaddition to PlanetLab such as RIPE Atlas this wouldboth lead to a better characterization of large deploy-ments (eg increase the recall) as well as possibly assistin confirmingdiscarding suspicious deployments (iethose for which we detected 2 replicas from PL)BGP hijacking Detecting geo-inconsistencies for know-ingly unicast prefixes is symptomatic of BGP hijack-ing attacks being able to periodically and quickly scanthe network to raise alarms and cross-check them withother types of data (eg BGP feeds traceroute mea-surements) is a relevant extension of this work

AcknowledgementsWe thank our shepherd Matteo Varvello and the anony-mous CoNEXT reviewers for their valuable feedbackThis work has been carried out at LINCS (httpwwwlincsfr) The research leading to these results was sup-ported by the European Union under the FP7 GrantAgreement n 318627 (Integrated Project rdquomPlanerdquo)and by a Google Faculty Research Award

6 REFERENCES[1] httpinternetcensus2012bitbucketorgpaperhtml

[2] httpw3techscomtechnologiesoverviewweb serverall

[3] J Abley A software approach to distributing requestsfor DNS service using GNU Zebra ISC BIND 9FreeBSD In Proc USENIX ATEC 2004

[4] J Abley and K Lindqvist Operation of AnycastServices RFC 4786 (Best Current Practice) 2006

[5] V K Adhikari S Jain and Z li Zhang Youtubetraffic dynamics and its interplay with a tier-1 isp Anisp perspective In ACM IMC 2010

[6] httpwwwcaidaorgprojectsark

[7] httpwwwroot-serversorg

[8] F Baker Requirements for IP Version 4 RoutersIETF RFC 1812 1995

[9] H Ballani and P Francis Towards a global IP anycastservice In Proc ACM SIGCOMM 2005

[10] H Ballani P Francis and S Ratnasamy Ameasurement-based deployment proposal for IPanycast In ACM IMC 2006

[11] B Barber M Larson and M Kosters Traffic sourceanalysis of the J root anycast instances Nanog 2006

[12] I Bermudez S Traverso M Mellia and M MunafoExploring the cloud from passive measurements TheAmazon AWS case In Proc IEEE INFOCOM 2013

[13] P Boothe and R Bush DNS Anycast Stability SomeEarly Results CAIDA 2005

[14] R Braden Requirements for Internet Hosts -Communication Layers IETF RFC 1122 1989

[15] M Calder X Fan Z Hu E Katz-BassettJ Heidemann and R Govindan Mapping theexpansion of googlersquos serving infrastructure In ACMIMC 2013

[16] M Calder A Flavel E Katz-Bassett R Mahajanand J Padhye Analyzing the performance of ananycast cdn In ACM IMC 2015

[17] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman A fistful of pings Accurateand lightweight anycast enumeration and geolocationIn Proc IEEE INFOCOM 2015

[18] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman Latency-based anycastgeolocalization Algorithms software and datasets InTech Rep 2015

[19] L Colitti Measuring anycast server performance Thecase of K-root Nanog 2006

[20] C Contavalli W van der Gaast S Leach andE Lewis Client Subnet in DNS Queries httpstools

ietforghtmldraft-ietf-dnsop-edns-client-subnet-04

[21] httpwwwenstfr˜drossianycast

[22] Z Durumeric E Wustrow and J A HaldermanZmap Fast internet-wide scanning and its securityapplications In USENIX Security Symposium 2013

[23] B Eriksson P Barford J Sommers and R NowakA learning-based approach for IP geolocation InPAM 2010

[24] B Eriksson and M Crovella Understandinggeolocation accuracy using network geometry In ProcIEEE INFOCOM 2013

[25] X Fan J S Heidemann and R GovindanEvaluating anycast in the domain name system InProc IEEE INFOCOM 2013

[26] httpwwwict-mplaneeupublicfastping

[27] A Flavel P Mani D A Maltz N Holt J LiuY Chen and O Surmachev Fastroute A scalableload-aware anycast routing architecture for moderncdns In Proc USENIX NSDI 2015

[28] B Gueye A Ziviani M Crovella and S FdidaConstraint-based geolocation of internet hosts InACM IMC 2004

[29] T Hardie Distributing Authoritative Name Serversvia Shared Unicast Addresses IETF RFC 3258 2002

[30] J Heidemann Y Pradkin R GovindanC Papadopoulos G Bartlett and J BannisterCensus and survey of the visible internet In ACMIMC 2008

[31] Internet addresses hitlist dataset (20140829rev4338)Provided by the USCLANDER project(httpwwwisieduantlander)

[32] D Karrenberg Anycast and BGP stability A closerlook at DNSMON data Nanog 2005

[33] T Krenc O Hohlfeld and A Feldmann An internetcensus taken by an illegal botnet A qualitativeassessment of published measurements ACM CCR44(3) 2014

[34] Z Liu B Huffaker M Fomenkov N Brownlee andK C Claffy Two days in the life of the DNS anycastroot servers In PAM 2007

[35] D Madory C Cook and K Miao Who are theanycasters Nanog 2013

[36] B M Maggs and R K Sitaraman Algorithmicnuggets in content delivery SIGCOMM ComputCommun Rev 45(3) Jul 2015

[37] K Miller Deploying IP anycast Nanog 2003

[38] httpsnmaporg

[39] httpswwwopendnscomdata-center-locations

[40] C Pelsser L Cittadini S Vissicchio and R BushFrom paris to tokyo On the suitability of ping tomeasure latency In ACM IMC 2013

[41] I Poese S Uhlig M A Kaafar B Donnet andB Gueye IP geolocation databases Unreliable ACMCCR 41(2) 2011

[42] D Rossi G Pujol X Wang and F Mathieu Peekingthrough the BitTorrent seedbox hosting ecosystem InProc Traffic Monitoring and Analysis (TMA) 2014

[43] S Sarat V Pappas and A Terzis On the use ofanycast in DNS In Proc ACM SIGMETRICS 2005

[44] Y Shavitt and N Zilberman A geolocation databasesstudy IEEE J-SAC 29(10) 2011

[45] F Streibelt J Bottger N Chatzis G Smaragdakisand A Feldmann Exploring edns-client-subnetadopters in your free time In ACM IMC 2013

[46] A Su D R Choffnes A Kuzmanovic andF Bustamante Drafting behind akamai(travelocity-based detouring) In Proc ACMSIGCOMM 2006

[47] R Torres A Finamore J R Kim M Mellia M MMunafo and S Rao Dissecting video server selectionstrategies in the YouTube CDN In Proc of IEEEICDCS 2011

[48] S Zander L L Andrew and G Armitage Capturingghosts Predicting the used ipv4 space by inferringunobserved addresses In ACM IMC 2014

  • Introduction
  • Methodology Overview
    • Workflow
    • State of the art
      • System design
        • Census targets
        • Measurement dataset vs platform
        • Measurement software
        • Network protocol
        • Scalability
          • Anycast 0 censuses
            • At a glance
            • Top-100 Anycast ASes
            • Anycast Services
              • Discussion
              • References
Page 10: Characterizing IPv4 Anycast Adoption and Deployment€¦ · Characterizing IPv4 Anycast Adoption and Deployment ... CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM

0

5

10

15

20

25

30

35

DNSCDN

CloudISP

UnknownSecurity

SocialOther

AS

bre

akd

ow

n [

]

Figure 11 Breakdown of AS category (only firstcategory is considered)

can exceed O(103) for the large providers which is inpart due to the low number of vantage points in Plan-etLab (see Sec32) Among those ASes we observe10 DNS service providers (including ISC DNScast andDynDNS) and 7 major CDNs (eg CloudeFlare Lime-light Highwinds Fastly CacheNetworks Instart LogicCDNetworks) We also discover two cloud providers(eg Microsoft and Aryaka Networks) one tier-1 ISP(Hurricane Electric which has 15 of ASes in its cus-tomer cone according to CAIDA) a security company(OpenDNS also popular for its public DNS service) asocial network (Facebook) and a manufacturer (Apple)

Fig 12 further reports the cumulative number of repli-cas per IP24 depicting both results coming from thecombination of censuses as well as individual resultfrom each census alone Specifically the MIS solverorders circles by increasing radius size intuitively thesmaller the latency the lower the number of overlapsthe better the recall of our method This is confirmedin Fig 12 where censuses are combined by computingthe minimum among multiple latency measurementsbetween the same VP and target pair to get an es-timate of the RTT latency that is as close as possibleto the propagation delay Additionally combining mea-surement increases recall about 200 more IP24 arefound to be anycast in the combination with respect tothe average individual census

In this paper we limitedly consider results from thecombination but remark that results are quite con-sistent across censuses (notice that curves overlap inFig 12) A last comment is worth making about de-ployments where we observe only 2 geographically dis-tributed replicas ndash which is possibly due to the low den-sity of our VPs but could also be tied to the wronggeolocation of some VP raising false positive replicasWhile we have anecdotal evidence of some of these ex-iguous deployments being anycast we prefer to defer amore detailed analysis for future work (see Sec 5)

0 01 02 03 04 05 06 07 08 09

1

2 5 10 15 20 25

CD

F o

f n

um

ber

of

IPs2

4

Number of replicas

Combination of censuses-308VPs(130315) Census 4-240VPs(040315) Census 3-269VPs(020315) Census 2-255VPs(270215) Census 1-261VPs

Figure 12 CDF of geographically distinct repli-cas per IP24 (individual censuses and overall)

0 01 02 03 04 05 06 07 08 09

1

1 10 100 1000

CLOUDFLARENETUS (328)

GOOGLEUS (102)

EDGECASTUS (37)

PROLEXICUS (21)

LINKEDINUS (1)

APPLEUS (6)

TWITTERUS (3)

LEVEL3US (2)C

DF

of

nu

mb

er

of

AS

es

Number of IPs24

gt= 5 Replicas

Figure 13 Number of IPs24 per AS

IP24 footprint In terms of the number of anycastIPs24 per AS (middle plot in Fig 9) we find thatthe CDN CloudFlare is by far the largest in terms ofIP address ranges Overall we find 10 ASes that haveat least 10 anycast IPs24 3 are CDNs (CloudFlareEdgeCast BitGravity) 3 are DNS providers (DNScastWoodyNet UltraDNS) and the remaining ASes rep-resent multiple services (Automattic Google AmazonWeb Services and Prolexic) The distribution of thenumber of IPs24 per AS depicted in Fig 13 shows thatabout half have exactly one IP24 (eg LinkedIn andATampT Services) Yet about 10 of the ASes employat least 10 subnets for instance Prolexic EdgeCastGoogle and CloudFlare employ 21 37 102 and 328anycast IP24 respectively

While in this work we do not provide a systematicinvestigation of the deployment density (ie how manyIP32 are alive in each IP24) from the above discus-sion about diversity is not surprising that we were ableto identify both very sparse (eg Google 8888 is theonly address alive in the 888024) and very dense de-ployments (eg well over 99 of IPs are alive in mostCloudFlare subnets)

Importance The presence of ASes ranking amongthe top-100 in the CAIDA list as well as CDNs servingcontent in the top-100k Alexa list are good indicatorsthat anycast is used for popular and important servicesConsidering CDNs that are after DNS the most popu-lar anycast service according to Fig 11 we observe that8 CDNs serve Alexa-100k websites this set includesCloudFlare EdgeCast and Fastly with 188 10 and5 websites respectively (in addition Highwinds Cach-eNetworks Instart Incapsula and BitGravity host onepopular site each) In addition 11 of the websites listedby Alexa are hosted by Google anycast IPs Finally10 websites are hosted on IPs that belong to Prolexic(now part of Akamai) which operates a DDoS mitiga-tion service that receives the traffic on behalf of its clientnetworks redirecting only legitimate traffic to them

43 Anycast ServicesPortscan campaign In reason of the historical use ofanycast for DNS services we believe it to be importantto provide an up-to-date longitudinal view across ser-vices offered via IP-anycast especially focusing on TCPWe provide a summary of the nmap probing in the topof Fig 14 We test all anycast 24 of the top-100 ASespicking a single IP representative per 24 we scan atlow rates all 216 TCP ports Our results are conser-vative in that different IPs in the same 24 may havedifferent open ports (which happens eg for Cloud-Flare and EdgeCast) and since an under-estimation ofthe number of open TCP ports can also be the result ofprobe filtering by firewalls and routers along the pathto the targets Out of the 897 IP of the top-100 ASeswe find that 816 of 81 ASes have at least one open TCPport The total number of distinct open TCP portsacross is 10485 providing 449 well-known services (ieas indicated by TCP port classification) 170 of whichover SSL Additionally nmap fingerprinting discovers30 different software implementations running on theanycast replicas that we also detail next

Class imbalance Given the heterogeneity of the IP24footprint we argue being necessary to consider only per-AS statistics to avoid presenting results that are biaseddue to class imbalance We illustrate the problem bydepicting in Fig 14 the frequency count of the top-10open TCP ports by number of ASes (top) and IPs24(bottom) Notice that only port 80 443 and 53 appearto be common to both top and bottom plots especiallyall ports in the hatched area are due to the large pre-dominance of IP24 owned by the CloudFlare AS whichalso affect the order of common ports in the top-10 Wethus focus on per AS statistics in the following

Stateful services Fig 15 presents the complemen-tary CDF of the number of open TCP ports per AS

nmap portscan statisticsIPs32 ASes Ports (SSL) Well Software

known812 81 10499 (185) 457 30

0

200

400

80

443

8080

53

2052

2053

2082

2083

8443

2087IP

24 f

req

uen

cy

IP2

4 f

req

uen

cy

0

200

400

htt

p

htt

p s

sl

htt

p

do

main

htt

p

htt

p

htt

p

htt

p

htt

p

htt

p

0

20

40

60

53

80

443

179

22

8080

8083

3306

1935

5252A

S f

req

uen

cy

AS

fre

qu

en

cy

0

20

40

60

do

main

htt

p

htt

p-s

sl

tcp

wra

pp

ed

ssh

htt

p

tcp

wra

pp

ed

tcp

wra

pp

ed

sip

mo

vaz-s

sc

Figure 14 Overall nmap portscan statistics andTop-10 open TCP ports (per AS and per 24)

We make the following observations (i) roughly half ofthe ASes have at least one open TCP port (ii) about10 of the ASes have at least 5 open TCP ports and(iii) the largest service footprint is represented by Incap-sula and especially OVH with 313 and 10148 open portsrespectively In the latter case while we did not inves-tigate thoroughly we suspect the large number of portsbeing due to the fact that OVH the largest hosting ser-vice in Europe and the 3rd in the world is significantlypopular in the BitTorrent seedbox ecosystem [42] Pre-dominant services (beyond DNS) include fairly popu-lar HTTP and HTTPS used by over 20 of the ASesEven excluding the OVH case the list of interestingservices is large In terms of business diversity 22 ASeshave at least 4 different TCP ports open 8 CDNs 4DNS 4 ISPs including a tier-1 ISP (Tinet SpA) andGoogle with 9 open TCP ports Finally interesting(though unpopular) services worth listing include multi-media services (RTMP Simplify Media MythTV) andgaming (Minecraft)

Software diversity Fig 16 lists 30 different soft-ware that we group into three main categories WebMail and DNS Interestingly the list includes opensource software such as popular web and DNS daemons(eg nginx ISC BIND) and proprietary software (egECAccECSECD which are web servers developed byEdgeCast) Starting with DNS software notice that for44 ASes using port 53 (out of 67) nmap could not iden-tify the software version running on the remote serverUnsurprisingly we find that ISC BIND is by far themost adopted protocol to handle DNS requests overanycast Yet we also detect the use by 3 ASes (AppleK-ROOT L-ROOT) of the NLnet Labs NSD implemen-

001

01

1

1 10 100 1000 10000 100000

OVHFR (10148)

INCAPSULAUS (330)

CLOUDFLARENETUS (20)

GOOGLEUS (9)

EDGECASTUS (5)C

CD

F

Number of open TCP ports

ASes

Figure 15 Complementary CDF of the numberof open TCP ports per AS

tation which is specifically designed to add resilienceagainst software failures of DNS root servers

Among web servers the most popular are nginx (7ASes) Apache httpd and lighttpd (ex aeligquo with 4ASes) We observe the use of proprietary web serversby some CDNs (eg cloudflare-nginx and Panel httpd)Though our dataset has a limited size we attempt acomparison with the relative popularity of webserversin the unicast world the Spearman correlation of popu-larity rank in our dataset with webserver ranks [2] in theAlexa-10M is low (038) As for the DNS case differencemay arise in some peculiar features that are especiallyvaluable in the anycast context Finally we detect thepresence of running daemons that serve mail on anycastIPs from Google (Gmail imapd Gmail pop3d gsmtp)as well as of RPC (ssh MicrosoftRPC) and databases(MySQLMicrosoft SQL)

5 DISCUSSIONWe present the first census(es) of IPv4 anycast de-

ployment gathered through an original and robust tech-nique implemented with an efficient and scalable systemdesign In spirit with the open source and data move-ment results of our census are available at [21]

Our characterization show that a tiny fraction of theIPv4 space is anycasted yet among the anycasters werecognize major players of the Internet ecosystem in-cluding top-ranking ISPs popular Cloud OTT and es-pecially CDN operators We additionally show greatheterogeneity along multiple directions and especiallyin terms of the offered services Particularly our portscancampaign of anycasted subnets reveals over 450 well-known services from over 10 000 unique open portsAdditionally we uncover 30 software implementationswith a relative breakdown that differs from softwareranking in the unicast IP world

Yet this work only scratches the surface and opensmore questions than it is able to answer as for instance

0

10

20

ISC

BIN

DN

Ln

et

Lab

s N

SD

Mic

roso

ft D

NS

Op

en

DN

S

ng

inx

lig

htt

pd

Ap

ach

e h

ttp

dE

CD

Mic

roso

ft H

TT

P

Mic

roso

ft IIS

Varn

ish

Ap

ach

e T

om

cat

bit

asic

v2

CF

S 0

213

clo

ud

flare

-ng

inx

cP

an

el h

ttp

dth

ttp

dE

CA

ccE

CS

Go

og

le h

ttp

din

sta

rt1

60

Gm

ail im

ap

dG

mail p

op

3d

Go

og

le g

sm

tp

Op

en

SS

HM

yS

QL

ssls

trip

Mic

roso

ft S

QL

M

icro

so

ft R

PC

DNS Web Mail Other

AS

fre

qu

en

cy

Figure 16 Breakdown of software running onanycast replicas

IP-level CDN Refining the active methodology bymapping content object (and not only frontpage) fromthe Alexa-100k would be needed to gather a better un-derstanding of IP-level CDNLongitudinal view Taking periodic censuses and an-alyzing the time evolution over longer timescales wouldallow to track evolution of IP anycast deploymentsTraffic volume A missing information concerns thetraffic volume served by IP anycast that can be gath-ered via passive measurement and annotated with re-sults of our census (ie binary flag per anycast IP24)Combine measurement platforms As we have seenit would be interesting to exploit multiple platforms inaddition to PlanetLab such as RIPE Atlas this wouldboth lead to a better characterization of large deploy-ments (eg increase the recall) as well as possibly assistin confirmingdiscarding suspicious deployments (iethose for which we detected 2 replicas from PL)BGP hijacking Detecting geo-inconsistencies for know-ingly unicast prefixes is symptomatic of BGP hijack-ing attacks being able to periodically and quickly scanthe network to raise alarms and cross-check them withother types of data (eg BGP feeds traceroute mea-surements) is a relevant extension of this work

AcknowledgementsWe thank our shepherd Matteo Varvello and the anony-mous CoNEXT reviewers for their valuable feedbackThis work has been carried out at LINCS (httpwwwlincsfr) The research leading to these results was sup-ported by the European Union under the FP7 GrantAgreement n 318627 (Integrated Project rdquomPlanerdquo)and by a Google Faculty Research Award

6 REFERENCES[1] httpinternetcensus2012bitbucketorgpaperhtml

[2] httpw3techscomtechnologiesoverviewweb serverall

[3] J Abley A software approach to distributing requestsfor DNS service using GNU Zebra ISC BIND 9FreeBSD In Proc USENIX ATEC 2004

[4] J Abley and K Lindqvist Operation of AnycastServices RFC 4786 (Best Current Practice) 2006

[5] V K Adhikari S Jain and Z li Zhang Youtubetraffic dynamics and its interplay with a tier-1 isp Anisp perspective In ACM IMC 2010

[6] httpwwwcaidaorgprojectsark

[7] httpwwwroot-serversorg

[8] F Baker Requirements for IP Version 4 RoutersIETF RFC 1812 1995

[9] H Ballani and P Francis Towards a global IP anycastservice In Proc ACM SIGCOMM 2005

[10] H Ballani P Francis and S Ratnasamy Ameasurement-based deployment proposal for IPanycast In ACM IMC 2006

[11] B Barber M Larson and M Kosters Traffic sourceanalysis of the J root anycast instances Nanog 2006

[12] I Bermudez S Traverso M Mellia and M MunafoExploring the cloud from passive measurements TheAmazon AWS case In Proc IEEE INFOCOM 2013

[13] P Boothe and R Bush DNS Anycast Stability SomeEarly Results CAIDA 2005

[14] R Braden Requirements for Internet Hosts -Communication Layers IETF RFC 1122 1989

[15] M Calder X Fan Z Hu E Katz-BassettJ Heidemann and R Govindan Mapping theexpansion of googlersquos serving infrastructure In ACMIMC 2013

[16] M Calder A Flavel E Katz-Bassett R Mahajanand J Padhye Analyzing the performance of ananycast cdn In ACM IMC 2015

[17] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman A fistful of pings Accurateand lightweight anycast enumeration and geolocationIn Proc IEEE INFOCOM 2015

[18] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman Latency-based anycastgeolocalization Algorithms software and datasets InTech Rep 2015

[19] L Colitti Measuring anycast server performance Thecase of K-root Nanog 2006

[20] C Contavalli W van der Gaast S Leach andE Lewis Client Subnet in DNS Queries httpstools

ietforghtmldraft-ietf-dnsop-edns-client-subnet-04

[21] httpwwwenstfr˜drossianycast

[22] Z Durumeric E Wustrow and J A HaldermanZmap Fast internet-wide scanning and its securityapplications In USENIX Security Symposium 2013

[23] B Eriksson P Barford J Sommers and R NowakA learning-based approach for IP geolocation InPAM 2010

[24] B Eriksson and M Crovella Understandinggeolocation accuracy using network geometry In ProcIEEE INFOCOM 2013

[25] X Fan J S Heidemann and R GovindanEvaluating anycast in the domain name system InProc IEEE INFOCOM 2013

[26] httpwwwict-mplaneeupublicfastping

[27] A Flavel P Mani D A Maltz N Holt J LiuY Chen and O Surmachev Fastroute A scalableload-aware anycast routing architecture for moderncdns In Proc USENIX NSDI 2015

[28] B Gueye A Ziviani M Crovella and S FdidaConstraint-based geolocation of internet hosts InACM IMC 2004

[29] T Hardie Distributing Authoritative Name Serversvia Shared Unicast Addresses IETF RFC 3258 2002

[30] J Heidemann Y Pradkin R GovindanC Papadopoulos G Bartlett and J BannisterCensus and survey of the visible internet In ACMIMC 2008

[31] Internet addresses hitlist dataset (20140829rev4338)Provided by the USCLANDER project(httpwwwisieduantlander)

[32] D Karrenberg Anycast and BGP stability A closerlook at DNSMON data Nanog 2005

[33] T Krenc O Hohlfeld and A Feldmann An internetcensus taken by an illegal botnet A qualitativeassessment of published measurements ACM CCR44(3) 2014

[34] Z Liu B Huffaker M Fomenkov N Brownlee andK C Claffy Two days in the life of the DNS anycastroot servers In PAM 2007

[35] D Madory C Cook and K Miao Who are theanycasters Nanog 2013

[36] B M Maggs and R K Sitaraman Algorithmicnuggets in content delivery SIGCOMM ComputCommun Rev 45(3) Jul 2015

[37] K Miller Deploying IP anycast Nanog 2003

[38] httpsnmaporg

[39] httpswwwopendnscomdata-center-locations

[40] C Pelsser L Cittadini S Vissicchio and R BushFrom paris to tokyo On the suitability of ping tomeasure latency In ACM IMC 2013

[41] I Poese S Uhlig M A Kaafar B Donnet andB Gueye IP geolocation databases Unreliable ACMCCR 41(2) 2011

[42] D Rossi G Pujol X Wang and F Mathieu Peekingthrough the BitTorrent seedbox hosting ecosystem InProc Traffic Monitoring and Analysis (TMA) 2014

[43] S Sarat V Pappas and A Terzis On the use ofanycast in DNS In Proc ACM SIGMETRICS 2005

[44] Y Shavitt and N Zilberman A geolocation databasesstudy IEEE J-SAC 29(10) 2011

[45] F Streibelt J Bottger N Chatzis G Smaragdakisand A Feldmann Exploring edns-client-subnetadopters in your free time In ACM IMC 2013

[46] A Su D R Choffnes A Kuzmanovic andF Bustamante Drafting behind akamai(travelocity-based detouring) In Proc ACMSIGCOMM 2006

[47] R Torres A Finamore J R Kim M Mellia M MMunafo and S Rao Dissecting video server selectionstrategies in the YouTube CDN In Proc of IEEEICDCS 2011

[48] S Zander L L Andrew and G Armitage Capturingghosts Predicting the used ipv4 space by inferringunobserved addresses In ACM IMC 2014

  • Introduction
  • Methodology Overview
    • Workflow
    • State of the art
      • System design
        • Census targets
        • Measurement dataset vs platform
        • Measurement software
        • Network protocol
        • Scalability
          • Anycast 0 censuses
            • At a glance
            • Top-100 Anycast ASes
            • Anycast Services
              • Discussion
              • References
Page 11: Characterizing IPv4 Anycast Adoption and Deployment€¦ · Characterizing IPv4 Anycast Adoption and Deployment ... CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM

Importance The presence of ASes ranking amongthe top-100 in the CAIDA list as well as CDNs servingcontent in the top-100k Alexa list are good indicatorsthat anycast is used for popular and important servicesConsidering CDNs that are after DNS the most popu-lar anycast service according to Fig 11 we observe that8 CDNs serve Alexa-100k websites this set includesCloudFlare EdgeCast and Fastly with 188 10 and5 websites respectively (in addition Highwinds Cach-eNetworks Instart Incapsula and BitGravity host onepopular site each) In addition 11 of the websites listedby Alexa are hosted by Google anycast IPs Finally10 websites are hosted on IPs that belong to Prolexic(now part of Akamai) which operates a DDoS mitiga-tion service that receives the traffic on behalf of its clientnetworks redirecting only legitimate traffic to them

43 Anycast ServicesPortscan campaign In reason of the historical use ofanycast for DNS services we believe it to be importantto provide an up-to-date longitudinal view across ser-vices offered via IP-anycast especially focusing on TCPWe provide a summary of the nmap probing in the topof Fig 14 We test all anycast 24 of the top-100 ASespicking a single IP representative per 24 we scan atlow rates all 216 TCP ports Our results are conser-vative in that different IPs in the same 24 may havedifferent open ports (which happens eg for Cloud-Flare and EdgeCast) and since an under-estimation ofthe number of open TCP ports can also be the result ofprobe filtering by firewalls and routers along the pathto the targets Out of the 897 IP of the top-100 ASeswe find that 816 of 81 ASes have at least one open TCPport The total number of distinct open TCP portsacross is 10485 providing 449 well-known services (ieas indicated by TCP port classification) 170 of whichover SSL Additionally nmap fingerprinting discovers30 different software implementations running on theanycast replicas that we also detail next

Class imbalance Given the heterogeneity of the IP24footprint we argue being necessary to consider only per-AS statistics to avoid presenting results that are biaseddue to class imbalance We illustrate the problem bydepicting in Fig 14 the frequency count of the top-10open TCP ports by number of ASes (top) and IPs24(bottom) Notice that only port 80 443 and 53 appearto be common to both top and bottom plots especiallyall ports in the hatched area are due to the large pre-dominance of IP24 owned by the CloudFlare AS whichalso affect the order of common ports in the top-10 Wethus focus on per AS statistics in the following

Stateful services Fig 15 presents the complemen-tary CDF of the number of open TCP ports per AS

nmap portscan statisticsIPs32 ASes Ports (SSL) Well Software

known812 81 10499 (185) 457 30

0

200

400

80

443

8080

53

2052

2053

2082

2083

8443

2087IP

24 f

req

uen

cy

IP2

4 f

req

uen

cy

0

200

400

htt

p

htt

p s

sl

htt

p

do

main

htt

p

htt

p

htt

p

htt

p

htt

p

htt

p

0

20

40

60

53

80

443

179

22

8080

8083

3306

1935

5252A

S f

req

uen

cy

AS

fre

qu

en

cy

0

20

40

60

do

main

htt

p

htt

p-s

sl

tcp

wra

pp

ed

ssh

htt

p

tcp

wra

pp

ed

tcp

wra

pp

ed

sip

mo

vaz-s

sc

Figure 14 Overall nmap portscan statistics andTop-10 open TCP ports (per AS and per 24)

We make the following observations (i) roughly half ofthe ASes have at least one open TCP port (ii) about10 of the ASes have at least 5 open TCP ports and(iii) the largest service footprint is represented by Incap-sula and especially OVH with 313 and 10148 open portsrespectively In the latter case while we did not inves-tigate thoroughly we suspect the large number of portsbeing due to the fact that OVH the largest hosting ser-vice in Europe and the 3rd in the world is significantlypopular in the BitTorrent seedbox ecosystem [42] Pre-dominant services (beyond DNS) include fairly popu-lar HTTP and HTTPS used by over 20 of the ASesEven excluding the OVH case the list of interestingservices is large In terms of business diversity 22 ASeshave at least 4 different TCP ports open 8 CDNs 4DNS 4 ISPs including a tier-1 ISP (Tinet SpA) andGoogle with 9 open TCP ports Finally interesting(though unpopular) services worth listing include multi-media services (RTMP Simplify Media MythTV) andgaming (Minecraft)

Software diversity Fig 16 lists 30 different soft-ware that we group into three main categories WebMail and DNS Interestingly the list includes opensource software such as popular web and DNS daemons(eg nginx ISC BIND) and proprietary software (egECAccECSECD which are web servers developed byEdgeCast) Starting with DNS software notice that for44 ASes using port 53 (out of 67) nmap could not iden-tify the software version running on the remote serverUnsurprisingly we find that ISC BIND is by far themost adopted protocol to handle DNS requests overanycast Yet we also detect the use by 3 ASes (AppleK-ROOT L-ROOT) of the NLnet Labs NSD implemen-

001

01

1

1 10 100 1000 10000 100000

OVHFR (10148)

INCAPSULAUS (330)

CLOUDFLARENETUS (20)

GOOGLEUS (9)

EDGECASTUS (5)C

CD

F

Number of open TCP ports

ASes

Figure 15 Complementary CDF of the numberof open TCP ports per AS

tation which is specifically designed to add resilienceagainst software failures of DNS root servers

Among web servers the most popular are nginx (7ASes) Apache httpd and lighttpd (ex aeligquo with 4ASes) We observe the use of proprietary web serversby some CDNs (eg cloudflare-nginx and Panel httpd)Though our dataset has a limited size we attempt acomparison with the relative popularity of webserversin the unicast world the Spearman correlation of popu-larity rank in our dataset with webserver ranks [2] in theAlexa-10M is low (038) As for the DNS case differencemay arise in some peculiar features that are especiallyvaluable in the anycast context Finally we detect thepresence of running daemons that serve mail on anycastIPs from Google (Gmail imapd Gmail pop3d gsmtp)as well as of RPC (ssh MicrosoftRPC) and databases(MySQLMicrosoft SQL)

5 DISCUSSIONWe present the first census(es) of IPv4 anycast de-

ployment gathered through an original and robust tech-nique implemented with an efficient and scalable systemdesign In spirit with the open source and data move-ment results of our census are available at [21]

Our characterization show that a tiny fraction of theIPv4 space is anycasted yet among the anycasters werecognize major players of the Internet ecosystem in-cluding top-ranking ISPs popular Cloud OTT and es-pecially CDN operators We additionally show greatheterogeneity along multiple directions and especiallyin terms of the offered services Particularly our portscancampaign of anycasted subnets reveals over 450 well-known services from over 10 000 unique open portsAdditionally we uncover 30 software implementationswith a relative breakdown that differs from softwareranking in the unicast IP world

Yet this work only scratches the surface and opensmore questions than it is able to answer as for instance

0

10

20

ISC

BIN

DN

Ln

et

Lab

s N

SD

Mic

roso

ft D

NS

Op

en

DN

S

ng

inx

lig

htt

pd

Ap

ach

e h

ttp

dE

CD

Mic

roso

ft H

TT

P

Mic

roso

ft IIS

Varn

ish

Ap

ach

e T

om

cat

bit

asic

v2

CF

S 0

213

clo

ud

flare

-ng

inx

cP

an

el h

ttp

dth

ttp

dE

CA

ccE

CS

Go

og

le h

ttp

din

sta

rt1

60

Gm

ail im

ap

dG

mail p

op

3d

Go

og

le g

sm

tp

Op

en

SS

HM

yS

QL

ssls

trip

Mic

roso

ft S

QL

M

icro

so

ft R

PC

DNS Web Mail Other

AS

fre

qu

en

cy

Figure 16 Breakdown of software running onanycast replicas

IP-level CDN Refining the active methodology bymapping content object (and not only frontpage) fromthe Alexa-100k would be needed to gather a better un-derstanding of IP-level CDNLongitudinal view Taking periodic censuses and an-alyzing the time evolution over longer timescales wouldallow to track evolution of IP anycast deploymentsTraffic volume A missing information concerns thetraffic volume served by IP anycast that can be gath-ered via passive measurement and annotated with re-sults of our census (ie binary flag per anycast IP24)Combine measurement platforms As we have seenit would be interesting to exploit multiple platforms inaddition to PlanetLab such as RIPE Atlas this wouldboth lead to a better characterization of large deploy-ments (eg increase the recall) as well as possibly assistin confirmingdiscarding suspicious deployments (iethose for which we detected 2 replicas from PL)BGP hijacking Detecting geo-inconsistencies for know-ingly unicast prefixes is symptomatic of BGP hijack-ing attacks being able to periodically and quickly scanthe network to raise alarms and cross-check them withother types of data (eg BGP feeds traceroute mea-surements) is a relevant extension of this work

AcknowledgementsWe thank our shepherd Matteo Varvello and the anony-mous CoNEXT reviewers for their valuable feedbackThis work has been carried out at LINCS (httpwwwlincsfr) The research leading to these results was sup-ported by the European Union under the FP7 GrantAgreement n 318627 (Integrated Project rdquomPlanerdquo)and by a Google Faculty Research Award

6 REFERENCES[1] httpinternetcensus2012bitbucketorgpaperhtml

[2] httpw3techscomtechnologiesoverviewweb serverall

[3] J Abley A software approach to distributing requestsfor DNS service using GNU Zebra ISC BIND 9FreeBSD In Proc USENIX ATEC 2004

[4] J Abley and K Lindqvist Operation of AnycastServices RFC 4786 (Best Current Practice) 2006

[5] V K Adhikari S Jain and Z li Zhang Youtubetraffic dynamics and its interplay with a tier-1 isp Anisp perspective In ACM IMC 2010

[6] httpwwwcaidaorgprojectsark

[7] httpwwwroot-serversorg

[8] F Baker Requirements for IP Version 4 RoutersIETF RFC 1812 1995

[9] H Ballani and P Francis Towards a global IP anycastservice In Proc ACM SIGCOMM 2005

[10] H Ballani P Francis and S Ratnasamy Ameasurement-based deployment proposal for IPanycast In ACM IMC 2006

[11] B Barber M Larson and M Kosters Traffic sourceanalysis of the J root anycast instances Nanog 2006

[12] I Bermudez S Traverso M Mellia and M MunafoExploring the cloud from passive measurements TheAmazon AWS case In Proc IEEE INFOCOM 2013

[13] P Boothe and R Bush DNS Anycast Stability SomeEarly Results CAIDA 2005

[14] R Braden Requirements for Internet Hosts -Communication Layers IETF RFC 1122 1989

[15] M Calder X Fan Z Hu E Katz-BassettJ Heidemann and R Govindan Mapping theexpansion of googlersquos serving infrastructure In ACMIMC 2013

[16] M Calder A Flavel E Katz-Bassett R Mahajanand J Padhye Analyzing the performance of ananycast cdn In ACM IMC 2015

[17] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman A fistful of pings Accurateand lightweight anycast enumeration and geolocationIn Proc IEEE INFOCOM 2015

[18] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman Latency-based anycastgeolocalization Algorithms software and datasets InTech Rep 2015

[19] L Colitti Measuring anycast server performance Thecase of K-root Nanog 2006

[20] C Contavalli W van der Gaast S Leach andE Lewis Client Subnet in DNS Queries httpstools

ietforghtmldraft-ietf-dnsop-edns-client-subnet-04

[21] httpwwwenstfr˜drossianycast

[22] Z Durumeric E Wustrow and J A HaldermanZmap Fast internet-wide scanning and its securityapplications In USENIX Security Symposium 2013

[23] B Eriksson P Barford J Sommers and R NowakA learning-based approach for IP geolocation InPAM 2010

[24] B Eriksson and M Crovella Understandinggeolocation accuracy using network geometry In ProcIEEE INFOCOM 2013

[25] X Fan J S Heidemann and R GovindanEvaluating anycast in the domain name system InProc IEEE INFOCOM 2013

[26] httpwwwict-mplaneeupublicfastping

[27] A Flavel P Mani D A Maltz N Holt J LiuY Chen and O Surmachev Fastroute A scalableload-aware anycast routing architecture for moderncdns In Proc USENIX NSDI 2015

[28] B Gueye A Ziviani M Crovella and S FdidaConstraint-based geolocation of internet hosts InACM IMC 2004

[29] T Hardie Distributing Authoritative Name Serversvia Shared Unicast Addresses IETF RFC 3258 2002

[30] J Heidemann Y Pradkin R GovindanC Papadopoulos G Bartlett and J BannisterCensus and survey of the visible internet In ACMIMC 2008

[31] Internet addresses hitlist dataset (20140829rev4338)Provided by the USCLANDER project(httpwwwisieduantlander)

[32] D Karrenberg Anycast and BGP stability A closerlook at DNSMON data Nanog 2005

[33] T Krenc O Hohlfeld and A Feldmann An internetcensus taken by an illegal botnet A qualitativeassessment of published measurements ACM CCR44(3) 2014

[34] Z Liu B Huffaker M Fomenkov N Brownlee andK C Claffy Two days in the life of the DNS anycastroot servers In PAM 2007

[35] D Madory C Cook and K Miao Who are theanycasters Nanog 2013

[36] B M Maggs and R K Sitaraman Algorithmicnuggets in content delivery SIGCOMM ComputCommun Rev 45(3) Jul 2015

[37] K Miller Deploying IP anycast Nanog 2003

[38] httpsnmaporg

[39] httpswwwopendnscomdata-center-locations

[40] C Pelsser L Cittadini S Vissicchio and R BushFrom paris to tokyo On the suitability of ping tomeasure latency In ACM IMC 2013

[41] I Poese S Uhlig M A Kaafar B Donnet andB Gueye IP geolocation databases Unreliable ACMCCR 41(2) 2011

[42] D Rossi G Pujol X Wang and F Mathieu Peekingthrough the BitTorrent seedbox hosting ecosystem InProc Traffic Monitoring and Analysis (TMA) 2014

[43] S Sarat V Pappas and A Terzis On the use ofanycast in DNS In Proc ACM SIGMETRICS 2005

[44] Y Shavitt and N Zilberman A geolocation databasesstudy IEEE J-SAC 29(10) 2011

[45] F Streibelt J Bottger N Chatzis G Smaragdakisand A Feldmann Exploring edns-client-subnetadopters in your free time In ACM IMC 2013

[46] A Su D R Choffnes A Kuzmanovic andF Bustamante Drafting behind akamai(travelocity-based detouring) In Proc ACMSIGCOMM 2006

[47] R Torres A Finamore J R Kim M Mellia M MMunafo and S Rao Dissecting video server selectionstrategies in the YouTube CDN In Proc of IEEEICDCS 2011

[48] S Zander L L Andrew and G Armitage Capturingghosts Predicting the used ipv4 space by inferringunobserved addresses In ACM IMC 2014

  • Introduction
  • Methodology Overview
    • Workflow
    • State of the art
      • System design
        • Census targets
        • Measurement dataset vs platform
        • Measurement software
        • Network protocol
        • Scalability
          • Anycast 0 censuses
            • At a glance
            • Top-100 Anycast ASes
            • Anycast Services
              • Discussion
              • References
Page 12: Characterizing IPv4 Anycast Adoption and Deployment€¦ · Characterizing IPv4 Anycast Adoption and Deployment ... CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM

001

01

1

1 10 100 1000 10000 100000

OVHFR (10148)

INCAPSULAUS (330)

CLOUDFLARENETUS (20)

GOOGLEUS (9)

EDGECASTUS (5)C

CD

F

Number of open TCP ports

ASes

Figure 15 Complementary CDF of the numberof open TCP ports per AS

tation which is specifically designed to add resilienceagainst software failures of DNS root servers

Among web servers the most popular are nginx (7ASes) Apache httpd and lighttpd (ex aeligquo with 4ASes) We observe the use of proprietary web serversby some CDNs (eg cloudflare-nginx and Panel httpd)Though our dataset has a limited size we attempt acomparison with the relative popularity of webserversin the unicast world the Spearman correlation of popu-larity rank in our dataset with webserver ranks [2] in theAlexa-10M is low (038) As for the DNS case differencemay arise in some peculiar features that are especiallyvaluable in the anycast context Finally we detect thepresence of running daemons that serve mail on anycastIPs from Google (Gmail imapd Gmail pop3d gsmtp)as well as of RPC (ssh MicrosoftRPC) and databases(MySQLMicrosoft SQL)

5 DISCUSSIONWe present the first census(es) of IPv4 anycast de-

ployment gathered through an original and robust tech-nique implemented with an efficient and scalable systemdesign In spirit with the open source and data move-ment results of our census are available at [21]

Our characterization show that a tiny fraction of theIPv4 space is anycasted yet among the anycasters werecognize major players of the Internet ecosystem in-cluding top-ranking ISPs popular Cloud OTT and es-pecially CDN operators We additionally show greatheterogeneity along multiple directions and especiallyin terms of the offered services Particularly our portscancampaign of anycasted subnets reveals over 450 well-known services from over 10 000 unique open portsAdditionally we uncover 30 software implementationswith a relative breakdown that differs from softwareranking in the unicast IP world

Yet this work only scratches the surface and opensmore questions than it is able to answer as for instance

0

10

20

ISC

BIN

DN

Ln

et

Lab

s N

SD

Mic

roso

ft D

NS

Op

en

DN

S

ng

inx

lig

htt

pd

Ap

ach

e h

ttp

dE

CD

Mic

roso

ft H

TT

P

Mic

roso

ft IIS

Varn

ish

Ap

ach

e T

om

cat

bit

asic

v2

CF

S 0

213

clo

ud

flare

-ng

inx

cP

an

el h

ttp

dth

ttp

dE

CA

ccE

CS

Go

og

le h

ttp

din

sta

rt1

60

Gm

ail im

ap

dG

mail p

op

3d

Go

og

le g

sm

tp

Op

en

SS

HM

yS

QL

ssls

trip

Mic

roso

ft S

QL

M

icro

so

ft R

PC

DNS Web Mail Other

AS

fre

qu

en

cy

Figure 16 Breakdown of software running onanycast replicas

IP-level CDN Refining the active methodology bymapping content object (and not only frontpage) fromthe Alexa-100k would be needed to gather a better un-derstanding of IP-level CDNLongitudinal view Taking periodic censuses and an-alyzing the time evolution over longer timescales wouldallow to track evolution of IP anycast deploymentsTraffic volume A missing information concerns thetraffic volume served by IP anycast that can be gath-ered via passive measurement and annotated with re-sults of our census (ie binary flag per anycast IP24)Combine measurement platforms As we have seenit would be interesting to exploit multiple platforms inaddition to PlanetLab such as RIPE Atlas this wouldboth lead to a better characterization of large deploy-ments (eg increase the recall) as well as possibly assistin confirmingdiscarding suspicious deployments (iethose for which we detected 2 replicas from PL)BGP hijacking Detecting geo-inconsistencies for know-ingly unicast prefixes is symptomatic of BGP hijack-ing attacks being able to periodically and quickly scanthe network to raise alarms and cross-check them withother types of data (eg BGP feeds traceroute mea-surements) is a relevant extension of this work

AcknowledgementsWe thank our shepherd Matteo Varvello and the anony-mous CoNEXT reviewers for their valuable feedbackThis work has been carried out at LINCS (httpwwwlincsfr) The research leading to these results was sup-ported by the European Union under the FP7 GrantAgreement n 318627 (Integrated Project rdquomPlanerdquo)and by a Google Faculty Research Award

6 REFERENCES[1] httpinternetcensus2012bitbucketorgpaperhtml

[2] httpw3techscomtechnologiesoverviewweb serverall

[3] J Abley A software approach to distributing requestsfor DNS service using GNU Zebra ISC BIND 9FreeBSD In Proc USENIX ATEC 2004

[4] J Abley and K Lindqvist Operation of AnycastServices RFC 4786 (Best Current Practice) 2006

[5] V K Adhikari S Jain and Z li Zhang Youtubetraffic dynamics and its interplay with a tier-1 isp Anisp perspective In ACM IMC 2010

[6] httpwwwcaidaorgprojectsark

[7] httpwwwroot-serversorg

[8] F Baker Requirements for IP Version 4 RoutersIETF RFC 1812 1995

[9] H Ballani and P Francis Towards a global IP anycastservice In Proc ACM SIGCOMM 2005

[10] H Ballani P Francis and S Ratnasamy Ameasurement-based deployment proposal for IPanycast In ACM IMC 2006

[11] B Barber M Larson and M Kosters Traffic sourceanalysis of the J root anycast instances Nanog 2006

[12] I Bermudez S Traverso M Mellia and M MunafoExploring the cloud from passive measurements TheAmazon AWS case In Proc IEEE INFOCOM 2013

[13] P Boothe and R Bush DNS Anycast Stability SomeEarly Results CAIDA 2005

[14] R Braden Requirements for Internet Hosts -Communication Layers IETF RFC 1122 1989

[15] M Calder X Fan Z Hu E Katz-BassettJ Heidemann and R Govindan Mapping theexpansion of googlersquos serving infrastructure In ACMIMC 2013

[16] M Calder A Flavel E Katz-Bassett R Mahajanand J Padhye Analyzing the performance of ananycast cdn In ACM IMC 2015

[17] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman A fistful of pings Accurateand lightweight anycast enumeration and geolocationIn Proc IEEE INFOCOM 2015

[18] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman Latency-based anycastgeolocalization Algorithms software and datasets InTech Rep 2015

[19] L Colitti Measuring anycast server performance Thecase of K-root Nanog 2006

[20] C Contavalli W van der Gaast S Leach andE Lewis Client Subnet in DNS Queries httpstools

ietforghtmldraft-ietf-dnsop-edns-client-subnet-04

[21] httpwwwenstfr˜drossianycast

[22] Z Durumeric E Wustrow and J A HaldermanZmap Fast internet-wide scanning and its securityapplications In USENIX Security Symposium 2013

[23] B Eriksson P Barford J Sommers and R NowakA learning-based approach for IP geolocation InPAM 2010

[24] B Eriksson and M Crovella Understandinggeolocation accuracy using network geometry In ProcIEEE INFOCOM 2013

[25] X Fan J S Heidemann and R GovindanEvaluating anycast in the domain name system InProc IEEE INFOCOM 2013

[26] httpwwwict-mplaneeupublicfastping

[27] A Flavel P Mani D A Maltz N Holt J LiuY Chen and O Surmachev Fastroute A scalableload-aware anycast routing architecture for moderncdns In Proc USENIX NSDI 2015

[28] B Gueye A Ziviani M Crovella and S FdidaConstraint-based geolocation of internet hosts InACM IMC 2004

[29] T Hardie Distributing Authoritative Name Serversvia Shared Unicast Addresses IETF RFC 3258 2002

[30] J Heidemann Y Pradkin R GovindanC Papadopoulos G Bartlett and J BannisterCensus and survey of the visible internet In ACMIMC 2008

[31] Internet addresses hitlist dataset (20140829rev4338)Provided by the USCLANDER project(httpwwwisieduantlander)

[32] D Karrenberg Anycast and BGP stability A closerlook at DNSMON data Nanog 2005

[33] T Krenc O Hohlfeld and A Feldmann An internetcensus taken by an illegal botnet A qualitativeassessment of published measurements ACM CCR44(3) 2014

[34] Z Liu B Huffaker M Fomenkov N Brownlee andK C Claffy Two days in the life of the DNS anycastroot servers In PAM 2007

[35] D Madory C Cook and K Miao Who are theanycasters Nanog 2013

[36] B M Maggs and R K Sitaraman Algorithmicnuggets in content delivery SIGCOMM ComputCommun Rev 45(3) Jul 2015

[37] K Miller Deploying IP anycast Nanog 2003

[38] httpsnmaporg

[39] httpswwwopendnscomdata-center-locations

[40] C Pelsser L Cittadini S Vissicchio and R BushFrom paris to tokyo On the suitability of ping tomeasure latency In ACM IMC 2013

[41] I Poese S Uhlig M A Kaafar B Donnet andB Gueye IP geolocation databases Unreliable ACMCCR 41(2) 2011

[42] D Rossi G Pujol X Wang and F Mathieu Peekingthrough the BitTorrent seedbox hosting ecosystem InProc Traffic Monitoring and Analysis (TMA) 2014

[43] S Sarat V Pappas and A Terzis On the use ofanycast in DNS In Proc ACM SIGMETRICS 2005

[44] Y Shavitt and N Zilberman A geolocation databasesstudy IEEE J-SAC 29(10) 2011

[45] F Streibelt J Bottger N Chatzis G Smaragdakisand A Feldmann Exploring edns-client-subnetadopters in your free time In ACM IMC 2013

[46] A Su D R Choffnes A Kuzmanovic andF Bustamante Drafting behind akamai(travelocity-based detouring) In Proc ACMSIGCOMM 2006

[47] R Torres A Finamore J R Kim M Mellia M MMunafo and S Rao Dissecting video server selectionstrategies in the YouTube CDN In Proc of IEEEICDCS 2011

[48] S Zander L L Andrew and G Armitage Capturingghosts Predicting the used ipv4 space by inferringunobserved addresses In ACM IMC 2014

  • Introduction
  • Methodology Overview
    • Workflow
    • State of the art
      • System design
        • Census targets
        • Measurement dataset vs platform
        • Measurement software
        • Network protocol
        • Scalability
          • Anycast 0 censuses
            • At a glance
            • Top-100 Anycast ASes
            • Anycast Services
              • Discussion
              • References
Page 13: Characterizing IPv4 Anycast Adoption and Deployment€¦ · Characterizing IPv4 Anycast Adoption and Deployment ... CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany c 2015 ACM

[4] J Abley and K Lindqvist Operation of AnycastServices RFC 4786 (Best Current Practice) 2006

[5] V K Adhikari S Jain and Z li Zhang Youtubetraffic dynamics and its interplay with a tier-1 isp Anisp perspective In ACM IMC 2010

[6] httpwwwcaidaorgprojectsark

[7] httpwwwroot-serversorg

[8] F Baker Requirements for IP Version 4 RoutersIETF RFC 1812 1995

[9] H Ballani and P Francis Towards a global IP anycastservice In Proc ACM SIGCOMM 2005

[10] H Ballani P Francis and S Ratnasamy Ameasurement-based deployment proposal for IPanycast In ACM IMC 2006

[11] B Barber M Larson and M Kosters Traffic sourceanalysis of the J root anycast instances Nanog 2006

[12] I Bermudez S Traverso M Mellia and M MunafoExploring the cloud from passive measurements TheAmazon AWS case In Proc IEEE INFOCOM 2013

[13] P Boothe and R Bush DNS Anycast Stability SomeEarly Results CAIDA 2005

[14] R Braden Requirements for Internet Hosts -Communication Layers IETF RFC 1122 1989

[15] M Calder X Fan Z Hu E Katz-BassettJ Heidemann and R Govindan Mapping theexpansion of googlersquos serving infrastructure In ACMIMC 2013

[16] M Calder A Flavel E Katz-Bassett R Mahajanand J Padhye Analyzing the performance of ananycast cdn In ACM IMC 2015

[17] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman A fistful of pings Accurateand lightweight anycast enumeration and geolocationIn Proc IEEE INFOCOM 2015

[18] D Cicalese D Joumblatt D Rossi M-O BuobJ Auge and T Friedman Latency-based anycastgeolocalization Algorithms software and datasets InTech Rep 2015

[19] L Colitti Measuring anycast server performance Thecase of K-root Nanog 2006

[20] C Contavalli W van der Gaast S Leach andE Lewis Client Subnet in DNS Queries httpstools

ietforghtmldraft-ietf-dnsop-edns-client-subnet-04

[21] httpwwwenstfr˜drossianycast

[22] Z Durumeric E Wustrow and J A HaldermanZmap Fast internet-wide scanning and its securityapplications In USENIX Security Symposium 2013

[23] B Eriksson P Barford J Sommers and R NowakA learning-based approach for IP geolocation InPAM 2010

[24] B Eriksson and M Crovella Understandinggeolocation accuracy using network geometry In ProcIEEE INFOCOM 2013

[25] X Fan J S Heidemann and R GovindanEvaluating anycast in the domain name system InProc IEEE INFOCOM 2013

[26] httpwwwict-mplaneeupublicfastping

[27] A Flavel P Mani D A Maltz N Holt J LiuY Chen and O Surmachev Fastroute A scalableload-aware anycast routing architecture for moderncdns In Proc USENIX NSDI 2015

[28] B Gueye A Ziviani M Crovella and S FdidaConstraint-based geolocation of internet hosts InACM IMC 2004

[29] T Hardie Distributing Authoritative Name Serversvia Shared Unicast Addresses IETF RFC 3258 2002

[30] J Heidemann Y Pradkin R GovindanC Papadopoulos G Bartlett and J BannisterCensus and survey of the visible internet In ACMIMC 2008

[31] Internet addresses hitlist dataset (20140829rev4338)Provided by the USCLANDER project(httpwwwisieduantlander)

[32] D Karrenberg Anycast and BGP stability A closerlook at DNSMON data Nanog 2005

[33] T Krenc O Hohlfeld and A Feldmann An internetcensus taken by an illegal botnet A qualitativeassessment of published measurements ACM CCR44(3) 2014

[34] Z Liu B Huffaker M Fomenkov N Brownlee andK C Claffy Two days in the life of the DNS anycastroot servers In PAM 2007

[35] D Madory C Cook and K Miao Who are theanycasters Nanog 2013

[36] B M Maggs and R K Sitaraman Algorithmicnuggets in content delivery SIGCOMM ComputCommun Rev 45(3) Jul 2015

[37] K Miller Deploying IP anycast Nanog 2003

[38] httpsnmaporg

[39] httpswwwopendnscomdata-center-locations

[40] C Pelsser L Cittadini S Vissicchio and R BushFrom paris to tokyo On the suitability of ping tomeasure latency In ACM IMC 2013

[41] I Poese S Uhlig M A Kaafar B Donnet andB Gueye IP geolocation databases Unreliable ACMCCR 41(2) 2011

[42] D Rossi G Pujol X Wang and F Mathieu Peekingthrough the BitTorrent seedbox hosting ecosystem InProc Traffic Monitoring and Analysis (TMA) 2014

[43] S Sarat V Pappas and A Terzis On the use ofanycast in DNS In Proc ACM SIGMETRICS 2005

[44] Y Shavitt and N Zilberman A geolocation databasesstudy IEEE J-SAC 29(10) 2011

[45] F Streibelt J Bottger N Chatzis G Smaragdakisand A Feldmann Exploring edns-client-subnetadopters in your free time In ACM IMC 2013

[46] A Su D R Choffnes A Kuzmanovic andF Bustamante Drafting behind akamai(travelocity-based detouring) In Proc ACMSIGCOMM 2006

[47] R Torres A Finamore J R Kim M Mellia M MMunafo and S Rao Dissecting video server selectionstrategies in the YouTube CDN In Proc of IEEEICDCS 2011

[48] S Zander L L Andrew and G Armitage Capturingghosts Predicting the used ipv4 space by inferringunobserved addresses In ACM IMC 2014

  • Introduction
  • Methodology Overview
    • Workflow
    • State of the art
      • System design
        • Census targets
        • Measurement dataset vs platform
        • Measurement software
        • Network protocol
        • Scalability
          • Anycast 0 censuses
            • At a glance
            • Top-100 Anycast ASes
            • Anycast Services
              • Discussion
              • References