phillipa gill - job application materialsphillipa/pgill_application.pdf · phillipa gill - job...

Phillipa Gill - Job Application Materials

December 5, 2011

This document contains the following application materials and three represen-tative papers:

• Curriculum vitae (4 pages)

• Research statement (4 pages)

• Teaching statement (2 pages)

• Let the Market Drive Deployment: A Strategy for Transitioning to BGPSecurity. Phillipa Gill, Michael Schapira and Sharon Goldberg. In SIG-COMM 2011. Toronto, Canada. Aug. 2011. (12 pages)

• Understanding Network Failures in Data Centers: Measurement, Analysis,and Implications. Phillipa Gill, Navendu Jain and Nachi Nagappan InSIGCOMM 2011. Toronto, Canada. Aug. 2011. (12 pages)

• Dude where’s that IP? Circumventing Measurement-based IP Geolocation.Phillipa Gill, Yashar Ganjali, Bernard Wong and David Lie. In the 19thUsenix Security Symposium. Washington, USA. Aug. 2010. (16 pages)

1

Phillipa Gill

Bahen Centre http://www.cs.toronto.edu/∼phillipa40 St. George Street [email protected], OntarioCanada M5S 2E4

Research InterestsComputer networks, network measurement and characterization, network security, online privacy, online socialnetworks and multimedia networking.

EducationPh.D., University of Toronto, Department of Computer Science 2008 - 2012 (expected)Cumulative GPA: 4.0/4.0Advisors: Yashar Ganjali, David LieResearch areas: Economic incentives for security and privacy, large-scale systems reliability

M.Sc., University of Calgary, Department of Computer Science 2006 - 2008Cumulative GPA: 4.0/4.0Advisors: Anirban Mahanti, Zongpeng LiThesis: YouTube Workload Characterization

B.Sc., University of Calgary, Department of Computer Science 2003 - 2006Cumulative GPA: 3.92/4.0GPA in Major Field: 4.0/4.0

Relevant ExperienceAT&T Labs – Research, Florham Park, NJVisiting Researcher Sept. 2011 - Dec. 2011Supervisor: Dr. Balachander KrishnamurthyVisiting researcher position in the area of Internet privacy. Studied economic incentives for online advertisers,publishers and users to collaborate in a market place for private information.

Boston University, Boston, MAVisiting Scholar Oct. 2010 - Feb. 2011Supervisor: Prof. Sharon GoldbergVisiting researcher position in the area of secure inter-domain routing. Developed a strategy for secure BGPdeployment that creates economic incentives for protocol adoption on the Internet.

Microsoft Research, Redmond, WAResearch Intern July 2010 - Oct. 2010Supervisor: Dr. Navendu JainResearch internship characterizing the reliability of data center networks. Performed a statistical analysis of a year’sworth of error logs to understand properties and impact of network failures.

MessageLabs/Symantec, Toronto, ONSoftware engineering course project (unpaid) Jan. 2009 - Apr. 2009Supervisor: Matt Sergeant and Dan BleakenCharacterized the behavior of botnets using spam e-mail records from a collection of honeypot domains. Receivedbest graduate student project award in this course.

HP Labs, Palo Alto, CA (located in Calgary, AB)Research Intern Apr. 2008 - July 2008Supervisor: Martin ArlittSummer research internship in the area of Web workload characterization. Characterized and classifiedorganizational Web service usage.

Dept. of Computer Science, University of Calgary, Calgary, ABResearch Assistant Jan. 2005 - Aug. 2006Supervisor: Prof. Anirban MahantiResearch in multimedia networks, in particular periodic broadcast and quality adaptation. Studied Internetcharacteristics with emphasis on round-trip times. Also examined effectiveness of streaming media in cellularnetworks.

P. Gill Curriculum Vitae - December 5, 2011 1/4

TRI-Faculty Lab, University of Calgary, Calgary, ABComputer Lab Supervisor May 2004 - Dec. 2004Supervisor: Paul KubicekTroubleshooting and maintenance of a Windows 98 computer lab. Helped students using Microsoft applications.Helped professors set up projectors for lecture presentations.

iTiva Development Corporation, Kelowna, BCJunior Engineer June 2004 - Aug. 2004Supervisor: Tom TaylorUsed C# and VisualStudio.net to implement a Domain Name System (DNS) server. Worked in a pair programmingenvironment to develop a file tracker for a distributed media application.

PublicationsJournal

• Phillipa Gill, Michael Schapira, and Sharon Goldberg. Modeling on Quicksand: Dealing with the Scarcity ofGround Truth in Interdomain Routing Data. ACM SIGCOMM Computer Communication Review (CCR).Jan. 2012.

• Phillipa Gill, Martin Arlitt, Niklas Carlsson, Anirban Mahanti, and Carey Williamson. CharacterizingOrganizational Use of Web-based Services: Methodology, Challenges, Observations, and Insights. ACMTransations on the Web (TWeb). Volume 5 Issue 4. Oct. 2011.

• Martin Arlitt, Niklas Carlsson, Phillipa Gill, Aniket Mahanti and Carey Williamson. CharacterizingIntelligence Gathering and Control on an Edge Network. ACM Transactions on Internet Technology (TOIT)Volume 11 Issue 1. July 2011.

• Balachander Krishnamurthy, Walter Willinger, Phillipa Gill and Martin Arlitt. A Socratic Method forValidation of Measurement-based Networking Research. Computer Communications. Volume 34 Issue 1. Jan.2011.

• Bianca Schroeder, Sotirios Damouras and Phillipa Gill. Understanding Latent Sector Errors and How toProtect Against Them. Transaction on Storage (ToS). Volume 6, Issue 3. Sept. 2010.

• Phillipa Gill, Liqi Shi, Anirban Mahanti, Zongpeng Li, and Derek Eager. Scalable On-Demand MediaStreaming for Heterogeneous Clients. ACM Transactions on Multimedia Computing, Communications, andApplications. Volume 5, Issue 1. Oct. 2008.

Conference

• Phillipa Gill, Michael Schapira and Sharon Goldberg. Let the Market Drive Deployment: A Strategy forTransitioning to BGP Security. In proceedings of ACM SIGCOMM 2011. Toronto, Canada. Aug. 2011.(Accept rate: 14% )

• Phillipa Gill, Navendu Jain and Nachi Nagappan. Understanding Network Failures in Data Centers:Measurement, Analysis, and Implications. In proceedings of ACM SIGCOMM 2011. Toronto, Canada. Aug.2011. (Accept rate: 14% )

• Phillipa Gill, Yashar Ganjali, Bernard Wong and David Lie. Dude, Where’s that IP? CircumventingMeasurement-based IP Geolocation. In proceedings of the 19th Usenix Security Symposium. WashingtonD.C., USA. Aug. 2010. (Accept rate: 15% )

• Lee Humphreys, Phillipa Gill and Balachander Krishnamurthy. How Much is Too Much? Privacy Issues onTwitter. In proceedings of the Conference of International Communication Association. Singapore. June2010.

• Bianca Schroeder, Sotirios Damouras and Phillipa Gill. Understanding Latent Sector Errors and How toProtect Against Them. In proceedings of the 8th Usenix Conference on File and Storage Technologies (FAST2010). San Jose, USA. Feb. 2010. (Accept rate: 24% )

• Phillipa Gill, Zongpeng Li, Anirban Mahanti, Jingxiang Luo and Carey Williamson. Network InformationFlow in Network of Queues. In proceedings of the 16th IEEE International Symposium on Modeling, Analysis,and Simulation of Computer and Telecommunication Systems (MASCOTS). Baltimore, USA. Sept. 2008.

• Phillipa Gill, Martin Arlitt, Zongpeng Li, and Anirban Mahanti. The Flattening Internet Topology: NaturalEvolution, Unsightly Barnacles or Contrived Collapse? In proceedings of the Passive and Active Measurement(PAM) Conference 2008. Cleveland, USA. Apr. 2008. (Best Paper Award; Accept rate: 32% )

• Phillipa Gill, Martin Arlitt, Zongpeng Li, and Anirban Mahanti. YouTube Traffic Characterization: A ViewFrom the Edge. In proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC). San Diego,USA. Oct. 2007. (Accept rate: 24% for full papers)


• Phillipa Sessini, Matei Leventer, and Anirban Mahanti. Video to Go: The Effects of Mobility on StreamingMedia in a CDMA2000 1xEV-DO Network. In proceedings of the ACM/SPIE Multimedia Computing andNetworking Conference (MMCN). San Jose, USA. Jan. 2007. (Accept rate: 30%)

• Liqi Shi, Phillipa Sessini, Anirban Mahanti, Zongpeng Li, and Derek Eager. Scalable Streaming forHeterogeneous Clients. In proceedings of ACM Multimedia 2006. Santa Barbara, USA. Oct. 2006. (Acceptrate: 16% )

• Phillipa Sessini and Anirban Mahanti. Observations on the Round-Trip Times of TCP Connections. Inproceedings of the Symposium on Performance Evaluation of Computer and Telecommunication Systems(SPECTS) 2006. Calgary, Canada. July/Aug. 2006.

Workshop & Short Papers

• Kathy Au, Billy Zhou, Zhen Huang, Phillipa Gill and David Lie. Short Paper: A Look at SmartPhonePermission Models. In proceedings of ACM CCS Workshop on Security and Privacy in Smartphones andMobile Devices. Chicago, USA. Oct. 2011

• Balachander Krishnamurthy, Phillipa Gill, and Martin Arlitt. A Few Chirps About Twitter. In proceedings ofACM SIGCOMM Workshop on Online Social Networks. Seattle, USA. Aug. 2008. (Accept rate: 35% )

• Phillipa Gill, Martin Arlitt, Zongpeng Li, and Anirban Mahanti. Characterizing User Sessions on YouTube.In proceedings of ACM/SPIE MMCN. San Jose, USA. Jan. 2008.

Technical Report

• Phillipa Sessini. Modeling the Gaia Hypothesis: Daisyworld. Dept. of Computer Science, University ofCalgary. Tech Report #2007-857-09. Apr. 2007.

Dissertation

• Phillipa Gill. YouTube Workload Characterization. MSc. Thesis, Dept. of Computer Science, University ofCalgary. March 2008. (Defended: March 7, 2008).

PresentationsUnderstanding Network Failures in Data Centers: Measurement, Analysis, and ImplicationsIndustry: IBM Student Workshop for Frontiers of Cloud Computing, Hawthorne, NY Dec. 2011Conference: SIGCOMM 2011, Toronto, ON Aug. 2011Let the Market Drive Deployment: A Strategy for Transitioning to BGP SecurityAcademia: Columbia University, New York, NY Nov. 2011

University of Calgary, Calgary, AB July 2011Princeton University, Princeton, NJ Mar. 2011Georgia Tech, Atlanta, GA Mar. 2011

Industry: AT&T Security Research Center, New York, NY Nov. 2011AT&T Labs – Research, Florham Park, NJ Oct. 2011

Conference: SIGCOMM 2011, Toronto, ON Aug. 2011

Modeling Adoption of Secure BGPIndustry: Boston Azure Users Group, Boston, MA Dec. 2010

Dude, Where’s that IP? Circumventing measurement-based IP geolocationIndustry: Google Scholars’ Retreat, San Francisco, CA June 2010

AT&T Labs – Research, Florham Park, NJ Dec. 2010Workshop: NSERC ISSNet Workshop, Ottawa, ON Apr. 2010Conference: Usenix Security 2010, Washington, DC Aug. 2010

A Few Chirps About TwitterWorkshop: Workshop on Online Social Networks, Seattle, WA Aug. 2008

Characterizing User Sessions on YouTubeConference: Multimedia Computing and Networking, San Jose, CA Jan. 2008

YouTube Workload Characterization: A View From the EdgeConference: Internet Measurement Conference, San Diego, CA Oct. 2007

Video to Go: The Effects of Mobility on Streaming Media in a CDMA2000 1xEV-DO NetworkConference: Multimedia Computing and Networking, San Jose, CA Jan. 2007

Scalable Streaming for Heterogeneous ClientsConference: ACM Multimedia, Santa Barbara, CA Oct. 2006

Observations on the Round-Trip Times of TCP ConnectionsConference: SPECTS, Calgary, AB July 2006


Teaching ExperienceDepartment of Computer Science, University of TorontoTeaching AssistantSCI199Y: Great Ideas in Computing Fall/Winter 2009-2010

First year seminar course about various aspects of computer science for non-science majors.

Teaching and Learning Center, University of CalgaryInstructor Skills Workshop Aug. 2007Participated in a four day interdisciplinary workshop on improving teaching skills.

Department of Computer Science, University of CalgaryTeaching AssistantCPSC265: Computer Architecture and Low-Level Programming Fall 2006

Introductory assembly language programming class (SPARC Assembler).CPSC313: Introduction to Computatility Winter 2007

Second year theory course on models of computation (eg., finite automata, grammars and Turing machines).CPSC325: Hardware/Software Interface Fall 2007

Second year hardware course covering more advanced assembly language, device drivers and interrupts.Received Teaching Assistant (TA) award for this course.

AwardsBest Presentation Award, IBM Workshop for Frontiers of Cloud Computing 2011Canada Anita Borg Finalist, Google 2009, 2010Alexander Graham Bell Canada Graduate Scholarship (CGS), NSERC 2008 - 2010Helen Sawyer Hogg Award, Dept. of Computer Science, University of Toronto 2008Graduate Student Award, Alberta Learning 2008Teaching Assistant Award, Dept. of Computer Science, University of Calgary 2007Department Research Award, Dept. of Computer Science, University of Calgary 2007Canada Graduate Scholarship (CGS), NSERC 2006 - 2008Graduate Student Scholarship, iCORE 2006 - 2008Dean’s Research Excellence Award, University of Calgary 2006Undergraduate Student Research Award (USRA), NSERC 2005, 2006Louise McKinney Scholarship, Alberta Learning 2004Dean’s List, University of Calgary 2003 - 2006Merit Award, University of Calgary 2003 - 2006Alexander Rutherford Scholarship, Alberta Learning 2003

Service and other activitiesExternal Reviewer: ACM SIGMETRICS 2011, ACM SIGCOMM Computer Communication Review (CCR)2011External Reviewer: Usenix Security 2010 2010External Reviewer: Eurosys ’09, COMSWARE ’09, MASCOTS ’09 2009Computer Science Graduate Society: GSU representative 2009 - 2010Graduate Students Association 40th Anniversary Scholarship: Reviewer 2008Computer Science Graduate Society: VP Communications 2007 - 2008Computer Science Graduate Society: Lab Representative 2006 - 2007International Genetically Engineered Machines (iGEM): Team Member 2006Computer Science Undergraduate Society: Volunteer 2005SCIberMENTOR: Online mentor 2004 - 2008Child Find Alberta: Volunteer IT consultant 2004


Research StatementPhillipa Gill ([email protected])

Drawing on seven years of research experience, I apply techniques from network measurement,data analysis and modeling to a wide range of networking problems. Recently, I have focused onnetwork security. This lead me to explore implications of IP geolocation being used in security-sensitive contexts [10] and incentives for deploying secure routing protocols on the Internet [12].Additionally, I have studied reliability in large-scale storage [23] and network systems [11].

Research overview

My broad research goal is to improve the systems and protocols required to support popular onlineservices. This goal is complicated by the nature of the Internet, with many organizations placingconstraints on the design of new systems. On one hand, you have commercial organizations (e.g.,Internet service providers (ISPs) and publishers) who are reluctant to deploy new systems thatincrease management overheads or require the purchase of additional hardware. On the other hand,you have standardization bodies (e.g., Internet Engineering Task Force (IETF)) that facilitate thespecification of new protocols and regulatory bodies (e.g., Federal Communications Commission(FCC)) that must balance public and commercial interests, while regulating the deployment of newsystems and protocols.

Achieving my goal of improving systems and protocols on the Internet requires taking a prac-tical approach. Thus, I utilize network measurement and characterization to understand patternsand uncover limitations in existing services. This understanding of existing systems is critical toinform the design of new systems and protocols. Additionally, I engage in dialogue with variousorganizations to understand how my research interacts with the complex Internet ecosystem.

Throughout my graduate studies, I have collaborated with researchers in industry labs to under-stand concerns of commercial organizations when considering new solutions. My study of networkreliability in [11] used challenges faced by network operators to guide analysis of network failuresin data centers. Results of [7] were presented to the Standard Performance Evaluation Corpora-tion (SPEC) committee to inform the design of SPECWeb2009, an industry-standard Web serverbenchmarking tool.

Our study of incentives for deploying secure routing protocols in [12] was grounded in the reali-ties of today’s standardization process through dialogue with groups within the IETF secure inter-domain routing working group [17]. Additionally, we presented our work to the North AmericanNetwork Operators Group (NANOG) to solicit feedback from the operators who will be responsiblefor deploying a secure routing protocol in practice. The results of our study, combined with itspractical underpinnings, spurred discussions within the FCC [15] about how to develop policies toencourage deployment of secure routing in practice.

The goal of supporting extremely popular services on the Internet has guided my researchdirections over the past seven years. Towards this goal I have considered three main questions:

Q1 Security. How do we ensure services are not impacted by malicious agents in the network?

Q2 Reliability. How do we design systems that are resilient to unexpected failures?

Q3 Performance. How do we build online services that can support millions of users?

P. Gill Research Statement - December 5, 2011 1/4

Securing the Internet’s routing system

When we interact with online services, we want to ensure our network traffic is delivered to theservice without interference from potentially malicious third parties. However, we are currentlyvulnerable to interference as a result of insecurity in the Border Gateway Protocol (BGP), theInternet’s de facto inter-domain routing protocol. Potential for serious attacks has been illustratedby high profile incidents such as Pakistan Telecom’s hijack of YouTube traffic [22] and ChinaTelecom’s potential interception of traffic to tens of thousands of networks [6]. To address thesevulnerabilities, secure BGP (S-BGP) [18] and secure origin BGP (soBGP) [24] have been proposedto validate network paths (collectively we refer to these protocols as S*BGP). However, even withrecent development of infrastructure to support S*BGP [2, 20], economic incentives for deploymentremain unclear. In [12], we present a three step strategy designed to create economic incentives forS*BGP deployment. A cornerstone of this proposal is BGP’s role in route selection on the Internet,which we harness to drive revenue-generating traffic towards ISPs that have deployed S*BGP. Thisrevenue-generating traffic creates economic incentives for ISPs to deploy S*BGP.

We demonstrate the potential for these economic incentives to drive S*BGP deployment byrunning simulations on empirically-derived inter-domain topologies. I used knowledge of potentialpitfalls when simulating on the Internet’s inter-domain topology to design robustness tests to mit-igate the impact of these pitfalls on our results. I also designed algorithms and data structuresthat enabled additional simulations for robustness testing by speeding up existing algorithms by1,000X. A more detailed exposition of these techniques can be found in [13].

Dialogue with standards bodies and operators. Our work in [12] benefited from discussionwith the standards bodies working on BGP security. We further engaged the network operatorcommunity by presenting our work at the meeting of the North American Network OperatorsGroup (NANOG). By engaging these groups, we positioned our study of BGP security withinthe realities of today’s standardization process. Our candidate deployment strategy and economicincentives it presents, have spurred the creation of a working group within the FCC to potentiallyimplement a similar strategy in practice [15].

Understanding reliability in large-scale systems.

Scaling popular online services has caused organizations to invest in networks of data centers thatcan harness economies of scale and provide agile scaling. At these scales, not only do systemsbecome more failure-prone, but failure can result in severe impact when multiple services run incommon data centers (e.g., [3]). To understand failures at scale, I have collaborated with industryto access data and leveraged a strong foundation of data analysis skills to characterize failures. Ihave successfully applied this approach to understand failures in both storage and network systems.

Storage systems. In [23], we consider the occurrences of Latent Sector Errors (LSEs) in harddisks, where even a single LSE can cause data loss when recovering from a disk failure. We focusour analysis on understanding properties of LSEs that are of interest to operators and provideparameterized models of these properties to enable more realistic simulations of LSEs in futureacademic studies. The StorageMojo blog recognized this work as the best paper at FAST’10 [14].

Network systems. In summer 2010, I interned at Microsoft Research where I studied the roleof network components in data center reliability [11]. Taking an operator-centric approach, wecharacterized properties of failures and the effectiveness of existing redundancy in the network tomitigate failure impact. Understanding network failures is an important first step towards oureventual goal of developing systems that can handle failures with minimal impact to services.


Characterizing User Generated Content.

In the mid-2000s there was a shift from Web sites hosting content authored by a single publisherto sites hosting User Generated Content (UGC). During my Master’s studies, I performed someof the first studies analyzing UGC on YouTube [7, 8] and Twitter [16, 19]. These studies laid afoundation for future work characterizing user generated content and its impact on network traffic,with two of these studies being cited more than 200 times [7, 19].

My Master’s thesis characterized YouTube, a popular UGC Web site [7, 8]. Unlike contem-poraneous work, we characterized usage of YouTube by analyzing network traffic. We focused onproperties that impact capacity planning at servers and edge networks. We presented results of [7]to the SPEC committee to inform design of SPECWeb2009, a tool used to benchmark Web servers.

I extended my study of UGC to consider usage of Twitter, which centers on short messages.In [19], we performed one of the first studies of Twitter usage by crawling the Twitter social graph.In collaboration with a sociologist at Cornell University, we extended our study of Twitter tounderstand the privacy implications of content posted on Twitter [16]. This latter study highlightsthe importance of reaching out to other disciplines to access relevant expertise, especially whenstudying user behavior online.

Future directions.

Going forward, I am interested in pursuing a research agenda that includes a combination ofnetwork measurement and modeling of economic incentives. I plan to apply these techniques tofurther improve security and reliability of networks. This includes ongoing work characterizing thecosts of S*BGP deployment and the following two future directions.

Protecting privacy of users. Many of today’s popular online services operate free-of-charge andare supported by online advertising which generates billions of dollars each year [4]. Behavioralad targeting that leverages user data has been used in recent years to improve ads and furtherincrease profits. However, this targeting raises concerns about user privacy. In collaboration withresearchers at AT&T Labs–Research, Telefonica Research and Columbia University, I am workingto ameliorate these concerns. The approach we are exploring is to place a trusted third partybetween Web users and online aggregators. This third party will mediate an “information market”where aggregators bid to gain access to user data and users receive compensation for their data.More generally, I’m interested in developing privacy protection measures that are compatible withthe economic interests of online advertisers and publishers. This will require studying incentives fordeployment of privacy protection measures and designing systems that are able to leverage them.

Revealing hidden portions of the Internet’s topology. Despite its engineered nature,we know surprisingly little about the inter-domain topology of the Internet [21]. This lack ofunderstanding severely limits the predictive power of inter-domain simulations. A major blind spotin Internet topology mapping has emerged as a result of large content providers peering with manysmall ISPs at public Internet eXchange Points (IXPs) [1]. This blind spot stems from how ISPsannounce paths through the Internet as well as where current topology measurements are made.In [9], we uncovered a small fraction of these peering links and recent studies have worked towardsfinding more of them [1, 5]. However, we are still a long way from knowing the true topologyof the Internet. I am interested in developing new methods to uncover larger fractions of thishidden portion of the Internet and understand how these links impact routing decisions of ISPs. Inaddition to shedding light on these hidden links, I will also pursue work on techniques to quantifyand mitigate the impacts of these blind spots on studies requiring inter-domain topologies.


References

[1] B. Augustin, B. Krishnamurthy, and W. Willinger. IXPs: Mapped? In IMC, 2009.

[2] R. Austein, G. Huston, S. Kent, and M. Lepinski. Secure inter-domain routing: Manifests for theresource public key infrastructure. draft-ietf-sidr-rpki-manifests-09.txt, 2010.

[3] J. Brodkin. Amazon EC2 outage calls ’availability zones’ into question, 2011. http://www.

networkworld.com/news/2011/042111-amazon-ec2-zones.html.

[4] I. A. Bureau. Internet advertising revenues hit $7.3 billion in Q1 11, 2011. http://www.iab.net/

about_the_iab/recent_press_releases/press_release_archive/press_release/pr-052611.

[5] K. Chen, D. Choffnes, R. Potharaju, Y. Chen, F. Bustamante, D. Pei, and Y. Zhao. Where the sidewalkends: Extending the Internet AS graph using traceroutes from P2P users. In ACM CoNEXT, 2009.

[6] J. Cowie. Rensys blog: China’s 18-minute mystery. http://www.renesys.com/blog/2010/11/

chinas-18-minute-mystery.shtml.

[7] P. Gill, M. Arlitt, Z. Li, and A. Mahanti. YouTube traffic characterization: A view from the edge. InACM Internet Measurement Conference (IMC), 2007.

[8] P. Gill, M. Arlitt, Z. Li, and A. Mahanti. Characterizing user sessions on YouTube. In MultimediaComputing and Networking (MMCN), 2008.

[9] P. Gill, M. Arlitt, Z. Li, and A. Mahanti. The flattening Internet topology: Natural evolution, unsightlybarnacles or contrived collapse? In Passive and Active Measurement, 2008.

[10] P. Gill, Y. Ganjali, and D. Lie. Dude, where’s that IP? Circumventing measurement-based IP geoloca-tion. In Usenix Security, 2010.

[11] P. Gill, N. Jain, and N. Nagappan. Understanding network failures in data centers: Measurement,analysis, and implications. In SIGCOMM, 2011.

[12] P. Gill, M. Schapira, and S. Goldberg. Let the market drive deployment: A strategy for transitioningto BGP security. In SIGCOMM, 2011.

[13] P. Gill, M. Schapira, and S. Goldberg. Modeling on quicksand: Dealing with the scarcity of groundtruth in interdomain routing data. ACM Computer Communications Review (CCR), 2012.

[14] R. Harris. StorageMojo’s best paper of FAST’10, 2010. http://storagemojo.com/2010/03/05/

storagemojos-best-paper-of-fast-10/.

[15] S. Hartman. CSRIC III working group descriptions and proposed co-chairs and leadership, 2011. http://transition.fcc.gov/pshs/advisory/csric3/wg-descriptions_v1.pdf.

[16] L. Humphreys, P. Gill, and B. Krishnamurthy. How much is too much? Privacy issues on Twitter. InConference of International Communication Association, 2010.

[17] IETF. Secure inter-domain routing (SIDR). http://datatracker.ietf.org/wg/sidr/charter/.

[18] S. Kent, C. Lynn, and K. Seo. Secure border gateway protocol (S-BGP). JSAC, 2000.

[19] B. Krishnamurthy, P. Gill, and M. Arlitt. A few chirps about Twitter. In Workshop on Online SocialNetworks (WOSN), 2008.

[20] C. D. Marsan. U.S. plots major upgrade to Internet router security. Network World, 2009.

[21] R. Oliveira, D. Pei, W. Willinger, B. Zhang, and L. Zhang. In search of the elusive ground truth: TheInternet’s AS-level connectivity structure. In SIGMETRICS, 2008.

[22] Rensys Blog. Pakistan hijacks YouTube. http://www.renesys.com/blog/2008/02/pakistan_

hijacks_youtube_1.shtml.

[23] B. Schroeder, S. Damouras, and P. Gill. Understanding latent sector errors and how to protect againstthem. In FAST, 2010.

[24] R. White. Deployment considerations for secure origin BGP (soBGP). draft-white-sobgp-bgp-deployment-01.txt, June 2003, expired.


Teaching StatementPhillipa Gill ([email protected])

As a graduate student, I have worked with undergraduates as both a teaching assistant and byadvising summer students. As a teaching assistant, I instructed students on topics ranging fromassembly language to introductory complexity analysis. It was incredibly satisfying to observe stu-dents go from initial frustration, when code does not behave as expected or proofs seem unsolvable,to confidence in their mastery of course material. Indeed, my efforts were acknowledged when Ireceived a teaching assistant award for a course on assembly language. I took initiative to furtherdevelop my teaching skills by participating in a four-day course at the University of Calgary.

In the summer of 2011, I advised two undergraduate students. These students were at differentpoints in their academic careers, with one near the beginning of his undergraduate studies andone preparing to embark on a Master’s degree, which required two different advising styles. Forthe early-career student, I developed a well defined project (developing a Facebook application)with room for the student to innovate. For the student about to enter graduate studies, I tooka different approach. I gave the student a broad problem (securing IP geolocation) and had himthink independently about how to solve the problem. The student quickly gained traction and wasable to demonstrate graduate-level research ability by tackling the problem using a combination ofcryptography and data analysis techniques.

Teaching Philosophy

Computer science is a discipline that demands a lot from its students. This is particularly trueduring the undergraduate program where students are confronted with the dual challenges of un-derstanding abstract concepts and learning the technical details of programming. Thus, computerscience educators have an important role to play in student success. There are three key techniquesI would use to ensure students are successful in my courses:1. Communicate concepts clearly. I would think of creative ways to present material to ensureall students can understand the topic at hand. This may involve using a combination of visual andhands on techniques (e.g., illustrating the concept of reductions).2. Create a positive classroom environment. I would also foster a positive classroom envi-ronment through informal interactive exercises to ensure students feel comfortable raising questionswhen material is unclear.3. Present engaging applications of course material. Finally, I would motivate students byproviding compelling applications for the task at hand. I would present students with a variety ofapplications, especially those they may not be aware of (e.g., medical imaging).

I would apply these techniques to teach both introductory and more advanced courses. As anetworking researcher, I am qualified to teach courses on networking, distributed systems, assemblylanguage, introductory theory, and introductory computer science.

Providing a solid foundation.

The first year of a computer science degree program brings together students with diverse technicalbackgrounds, ranging from students who completed technical high school programs to those withno programming experience. A professor of introductory computer science must address a keychallenge: how to keep more experienced students engaged while ensuring students new to the fieldcan learn at their own pace?

P. Gill Teaching Statement - December 5, 2011 1/2

Peer mentorship. I began my computer science career in the latter group and the support Ireceived in introductory computer science was a key factor in determining my future success. Onepiece of support that I found instrumental was knowing students who were a few years ahead inthe program. They provided not only technical help but also enabled me to see the interestingapplications of introductory material in the more advanced courses. I would work to pair studentsin need of additional support with interested “mentors” in third and forth year.

Giving students opportunity to excel. To keep a multitude of students engaged in learning, Iwould design assignments with a core component to drive home a key concept (e.g., writing a mousedriver) and optional extensions for students with more interest in the subject matter. A favoritememory from my time as an undergraduate was an assignment with optional extensions where apartner and I developed our basic device driver “Pong” game into an elaborate game of “Asteroids”with many additional features. In addition to technical mastery, the optional components allowedus to show our creativity and have a sense of pride in our work that made the course very engaging.

Developing the next generation of computer scientists.

As students master technical skills, professors play an important role in developing skills in stu-dents beyond rote mastery of curriculum material. The ability to understand concepts beyondthe technical details, enables students to adapt to changing technology and is a key differentiatorbetween undergraduate computer science and more technically oriented diploma programs.

Improving communication skills. I would develop communication skills in students by em-phasizing in-class presentations. The ability to communicate technical material can differentiatestudents in competitive job markets and is a key requirement for students interested in pursuinggraduate studies. Having to present to their peers would require students to think carefully aboutthe concepts so that they present in a coherent manner. For course projects, a final presentationgives students a chance to learn about projects done by their peers.

Course development

Additionally, I would be interested in developing two courses.

Technology and society. I would develop an undergraduate course where students thinkcritically about how computing impacts society. As technology permeates more areas of our lives,the next generation of technologists need to reason about the social impacts of technology theydevelop. The course would cover topics such as social networks, online privacy and the “filterbubble” that arises when algorithms attempt to show users relevant content. As a final project,students would read a book about society and technology (e.g., Jaron Lanier’s “You Are Not AGadget”) and present a critical assessment of the book. While this course would be targeted atcomputer science students, outreach to social scientists would help to enrich the course.

Network measurement. Network measurement is a critical skill for for networking studentswho often end up using empirical data in their studies. Measurement of the Internet is complicatedby the limited vantage points we have to observe it. I would develop a course to teach studentstechniques for measuring the Internet at different levels of the protocol stack; from network levelstudy of Internet topology to study of online social networks at the application layer. The coursewould consist of reading relevant papers in the area of network measurement with hands-on exercisesto familiarize the students with the tools (e.g., Routeviews, social network APIs). Students wouldalso undertake a final project to apply their knowledge to solve a research problem.

P. Gill Teaching Statement - December 5, 2011 2/2

Let the Market Drive Deployment:A Strategy for Transitioning to BGP Security

Phillipa GillUniversity of Toronto

Michael SchapiraPrinceton University

Sharon GoldbergBoston University

AbstractWith a cryptographic root-of-trust for Internet routing(RPKI [17]) on the horizon, we can finally start planning thedeployment of one of the secure interdomain routing proto-cols proposed over a decade ago (Secure BGP [22], secureorigin BGP [37]). However, if experience with IPv6 is anyindicator, this will be no easy task. Security concerns aloneseem unlikely to provide sufficient local incentive to drivethe deployment process forward. Worse yet, the securitybenefits provided by the S*BGP protocols do not even kickin until a large number of ASes have deployed them.

Instead, we appeal to ISPs’ interest in increasing revenue-generating traffic. We propose a strategy that governmentsand industry groups can use to harness ISPs’ local businessobjectives and drive global S*BGP deployment. We evalu-ate our deployment strategy using theoretical analysis andlarge-scale simulations on empirical data. Our results giveevidence that the market dynamics created by our proposalcan transition the majority of the Internet to S*BGP.

Categories and Subject Descriptors: C.2.2 [Computer-Communication Networks]: Network Protocols

General Terms: Economics, Security

1. INTRODUCTIONThe Border Gateway Protocol (BGP), which sets up routes

from autonomous systems (ASes) to destinations on the In-ternet, is amazingly vulnerable to attack [7]. Every fewyears, a new failure makes the news; ranging from misconfig-urations that cause an AS to become unreachable [34, 29],to possible attempts at traffic interception [11]. To rem-edy this, a number of widely-used stop-gap measures havebeen developed to detect attacks [20, 25]. The next stepis to harden the system to a point where attacks can beprevented. After many years of effort, we are finally seeingthe initial deployment of the Resource Public Key Infras-tructure (RPKI) [4, 27], a cryptographic root-of-trust forInternet routing that authoritatively maps ASes to their IPprefixes and public keys. With RPKI on the horizon, we

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SIGCOMM’11, August 15-19, 2011, Toronto, Ontario, Canada.Copyright 2011 ACM 978-1-4503-0797-0/11/08 ...$10.00.

can now realistically consider deploying the S*BGP proto-cols, proposed a decade ago, to prevent routing failures byvalidating AS-level paths: Secure BGP (S-BGP) [22] andSecure Origin BGP (soBGP) [37].

1.1 Economic benefits for S*BGP adoption.While governments and industry groups may have an in-

terest in S*BGP deployment, ultimately, the Internet lacksa centralized authority that can mandate the deploymentof a new secure routing protocol. Thus, a key hurdle forthe transition to S*BGP stems from the fact that each ASwill make deployment decisions according to its own localbusiness objectives.

Lessons from IPv6? Indeed, we have seen this problembefore. While IPv6 has been ready for deployment sincearound 1998, the lack of tangible local incentive for IPv6deployment means that we are only now starting to see theseeds of large-scale adoption. Conventional wisdom suggeststhat S*BGP will suffer from a similar lack of local incentivesfor deployment. The problem is exacerbated by the fact thatan AS cannot validate the correctness of an AS-level pathunless all the ASes on the path deployed S*BGP. Thus, thesecurity benefits of S*BGP only apply after a large fractionof ASes have already deployed the protocol.

Economic incentives for adoption. We observe that,unlike IPv6, S*BGP can impact routing of Internet traf-fic, and that this may be used to drive S*BGP deployment.These crucial observations enable us to avoid the above is-sues and show that global S*BGP deployment is possibleeven if local ASes’ deployment decisions are not motivatedby security concerns! To this end, we present a prescriptivestrategy for S*BGP deployment that relies solely on Inter-net Service Providers’ (ISPs) local economic incentives todrive global deployment; namely, ISP’s interest in attract-ing revenue-generating traffic to their networks.

Our strategy is prescriptive (Section 2). We propose guide-lines for how (a) ASes should deploy S*BGP in their net-works, and (b) governments, industry groups, and other in-terested parties should invest their resources in order to driveS*BGP deployment forward.

1. Break ties in favor of secure paths. First, werequire ASes that deploy S*BGP to actually use it to informroute selection. However, rather than requiring security bethe first criterion ASes use to select routes, we only requiresecure ASes to break ties between equally-good routes infavor of secure routes. This way, we create incentives forISPs to deploy S*BGP so they can transit more revenue-generating customer traffic than their insecure competitors.

2. Make it easy for stubs to adopt S*BGP. 85% ofASes in the Internet are stubs (i.e., ASes with no customers)[9]. Because stubs earn no revenue from providing Internetservice, we argue for driving down their deployment costsby having ISPs sign BGP announcements on their behalf ordeploy a simplex (unidirectional) S*BGP [26] on their stubcustomers. In practice, such a simplex S*BGP must eitherbe extremely lightweight or heavily subsidized.

3. Create market pressure via early adopters. Wepropose that governments and industry groups concentratetheir regulatory efforts, or financial incentives, on convincinga small set of early adopters to deploy S*BGP. We showthat this set of early adopters can create sufficient marketpressure to convince a large fraction of ASes to follow suit.

1.2 Evaluation: Model and simulations.To evaluate our proposal, we needed a model of the S*BGP

deployment process.

Inspiration from social networks? At first glance, itseems that the literature on technology adoption in socialnetworks would be applicable here (e.g., [30, 21] and ref-erences therein). However, in social networks models, anentity’s decision to adopt a technology depends only on itsimmediate neighbors in the graph; in our setting, this de-pends on the number of secure paths. This complicationmeans that many elegant results from this literature haveno analogues in our setting (Section 9).

Our model. In contrast to earlier work that assumesthat ASes deploy S*BGP because they are concerned aboutsecurity [8, 5], our model assumes that ISPs’ local deploy-ment decisions are based solely on their interest in increasingcustomer traffic (Section 3).

We carefully designed our model to capture a few crucialissues, including the fact that (a) traffic transited by an ISPcan include flows from any pair of source and destinationASes, (b) a large fraction of Internet traffic originates ina few large content provider ASes [24], and (c) the costof S*BGP deployment can depend on the size of the ISP’snetwork. The vast array of parameters and empirical datarelevant to such a model (Section 8) mean that our analysisis not meant to predict exactly how the S*BGP deploymentprocess will proceed in practice; instead, our goal was toevaluate the efficacy of our S*BGP deployment strategy.

Theorems, simulations and examples. We exploreS*BGP deployment in our model using a combination oftheoretical analysis and simulations on empirical AS-levelgraphs [9, 3] (Sections 5-7). Every example we present comesdirectly from these simulations. Instead of artificially reduc-ing algorithmic complexity by subsampling [23], we ran oursimulations over the full AS graph (Section 4). Thus, oursimulations ran in time O(N3) with N = 36K, and we de-voted significant effort to developing parallel algorithms thatwe ran on a 200-node DryadLINQ cluster [38].

1.3 Key insights and recommendations.Our evaluation indicates that our strategy for S*BGP

deployment can drive a transition to S*BGP (Section 5).While we cannot predict exactly how S*BGP deploymentwill progress, a number of important themes emerge:

1. Market pressure can drive deployment. We foundthat when S*BGP deployment costs are low, the vast major-ity of ISPs have incentives to deploy S*BGP in order to dif-

ferentiate themselves from, or keep up with, their competi-tors (Section 5). Moreover, our results show this holds evenif 96% of routing decisions (across all source-destination ASpairs) are not influenced by security concerns (Section 6.6).

2. Simplex S*BGP is crucial. When deployment costsare high, deployment is primarily driven by simplex S*BGP(Section 6).

3. Choose a few well-connected early adopters. Theset of early adopters cannot be random; it should includewell-connected ASes like the Tier 1’s and content providers(Section 6). While we prove that it is NP-hard to even ap-proximate the optimal set of early adopters (Section 6.1),our results show that even 5-10 early adopters suffice whendeployment costs are low.

4. Prepare for incentives to disable S*BGP. We showthat ISPs can have incentives to disable S*BGP (Section 7).Moreover, we prove that there could be deployment oscilla-tions (where ASes endlessly turn S*BGP on and off), andthat it is computationally hard to even determine whethersuch oscillations exist.

5. Minimize attacks during partial deployment. Evenwhen S*BGP deployment progressed, there were always someASes that did not deploy (Section 5, 6). As such, we expectthat S*BGP and BGP will coexist in the long term, suggest-ing that careful engineering is required to ensure that thisdoes not introduce new vulnerabilities into the interdomainrouting system.

Paper organization. Section 2 presents our proposedstrategy for S*BGP deployment. To evaluate the proposal,we present a model of the deployment process in Section 3.In Section 5-7 we explore this model using theoretical anal-ysis and simulations, and present an in-depth discussion ofour modeling assumptions in Section 8. Section 9 presentsrelated work. The full version of this paper [2] contain im-plementation details for our simulations, proofs of all ourtheorems, and supplementary data analysis.

2. S*BGP DEPLOYMENT STRATEGY

2.1 S*BGP: Two possible solutions.With RPKI providing an authoritative mapping from ASes

to their cryptographic public keys, two main protocols havebeen proposed that prevent the propagation of bogus ASpath information:

Secure BGP (S-BGP) [22]. S-BGP provides path val-idation, allowing an AS a1 that receives a BGP announce-ment a1a2...akd to validate that every AS aj actually sentthe announcement in the path. With S-BGP, a router mustcryptographically sign each routing message it sends, andcryptographically verify each routing message it receives.

Secure Origin BGP (soBGP) [37]. soBGP providesa slightly weaker security guarantee called topology valida-tion, that allows an AS to validate that a path it learnsphysically exists in the network. To do this, soBGP requiresneighboring ASes to mutually authenticate a certificate forthe existence of a link between them, and validate every pathit learns from a BGP announcement against these crypto-graphic certificates.

Because our study is indifferent to attacks and adversaries,it applies equally to each of these protocols. We refer to

them collectively as S*BGP, and an AS that deploys themas secure.

2.2 How to standardize S*BGP deployment.To create local economic incentives for ISPs to deploy

S*BGP, we propose that Internet standards should requireASes to deploy S*BGP as follows:

2.2.1 Simplex S*BGP for stubs.For stubs, Internet access is a cost, rather than a revenue

source, and it seems unlikely that security concerns alonewill suffice to motivate stubs to undertake a costly S*BGPdeployment. However, because stubs propagate only outgo-ing BGP announcements for their own IP prefixes we sug-gest two possible solutions to this problem: (1) allow ISPsto sign on behalf of their stub customers or (2) allow stubsto deploy simplex (unidirectional) S*BGP. Indeed, the lat-ter approach has been proposed by the Internet standardscommunity [26].

Simplex S-BGP. For S-BGP, this means that stubs needonly sign outgoing BGP announcements for their own IPprefixes, but not validate incoming BGP announcements forother IP prefixes1. Thus, a stub need only store its ownpublic key (rather than obtaining the public keys of eachAS on the Internet from the RPKI) and cryptographicallysign only a tiny fraction of the BGP announcements it sees.Simplex S-BGP can significantly decrease the computationalload on the stub, and can potentially be deployed as a soft-ware, rather than hardware, upgrade to its routers.

Simplex soBGP. For soBGP, this means that a stub needonly create certificates for its links, but need not need val-idate the routing announcements it sees. Simplex soBGPis done offline; once a stub certifies his information in thesoBGP database, its task is complete and no router upgradeis required.

The objective of simplex S*BGP is to make it easy for stubsto become secure by lowering deployment costs and compu-tational overhead. While we certainly allows for stubs (e.g.,banks, universities) with an interest in security to move fromsimplex S*BGP to full S*BGP, our proposal does not requirethem to do so.

Impact on security. With simplex S*BGP, a stub lacksthe ability to validate paths for prefixes other than its own.Since stubs constitute about 85% of ASes [9], a first glancesuggests that simplex S*BGP leads to significantly worsesecurity in the global Internet.

We argue that this is not so. Observe that if a stub s hasan immediate provider p that has deployed S*BGP and iscorrectly validating paths, then no false announcements offully secure paths can reach s from that provider, unless phimself maliciously (or mistakenly) announces false securepaths to s. Thus, in the event that stubs upgrade to simplexS*BGP and all other ASes upgrade to full S*BGP, the onlyopen attack vector is for ISPs to announce false paths totheir own stub customers. However, we observe the impactof a single misbehaving ISP is small, since 80% of ISPs haveless than 7 stub customers, and only about 1% of ISPs havemore than 100 stub customers [9]. Compare this to the

1A stub may even choose to delegate its cryptographic keysto its ISPs, and have them sign for him; while this might bea good first step on the path to deployment, ceding controlof cryptographic keys comes at the cost of reduced security.

insecure status quo, where an arbitrary misbehaving AS canimpact about half of the ASes in the Internet (around 15KASes) on average [14].

2.2.2 Break ties in favor of fully secure paths.In BGP, an AS chooses the path to a given destination AS

d based on a ranking on the outgoing paths it learns fromits neighbors (e.g., Appendix A). Paths are first ranked ac-cording to interdomain considerations (local preference, ASpath length) and then according to intradomain considera-tions (e.g., MEDs, hot-potato routing)2.

Secure paths. We say that a path is secure iff everyAS on that path is secure. We do this because an AS can-not validate a path unless every AS on the path signed therouting announcement (S-BGP) or issued certificates for thelinks on the path (soBGP).

Security as part of route selection. The next partof our proposal suggests that once an AS has the abilityto validate paths, it should actually use this information toinform its routing decisions. In principle, an AS might evenmodify its ranking on outgoing paths so that security is itshighest priority. Fortunately, we need not go to such lengths.Instead, we only require secure ASes to break ties betweenequally good interdomain paths in favor of secure paths.This empowers secure ISPs to attract customer traffic awayfrom their insecure competitors. To ensure that a newly-secure AS can regain lost customer traffic, we require thatoriginal tie-break criteria (e.g., intradomain considerations)be employed in the case of equally good, secure interdomainpaths. Thus, the size of the set of equally-good interdomainpaths for a given source-destination pair (which we call thetiebreak set) gives a measure of competition in the AS graph.

Route selection at stubs. For stubs running simplexS*BGP, we consider both the case where they break ties infavor of secure paths (i.e., because they trust their providersto verify paths for them) and the case where they ignoresecurity altogether (i.e., because they do not verify paths)(Section 6.7).

Partially secure paths. We do not allow ASes toprefer partially-secure paths over insecure paths, to avoidintroducing new attack vectors that do exist even withoutS*BGP (e.g., attack in Appendix B).

We shall show that S*BGP deployment progresses quite ef-fectively even if stubs ignore security and tiebreak sets arevery small (Section 6.7-6.6).

2.3 How third parties should drive deployment.Early adopters. To kick off the process, we suggestthat interested third parties (e.g., governments, regulators,industry groups) focus regulation, subsidies, or external fi-nancial incentives on convincing a set of early adopter ASesto deploy S*BGP. One regulatory mechanism may be forthe government to require their network providers to deployS*BGP first. In the AS graph ([9, 3]), providers to the gov-ernment include many Tier 1 ISPs who may be difficult orexpensive to persuade via other means.

ISPs upgrade their stubs. Next, we suggest that asecure ISP should be responsible for upgrading all its in-secure stub customers to simplex S*BGP. To achieve this,

2For simplicity, we do not model intradomain routing con-siderations. However, it should be explored in future work.

interested third parties should ensure that simplex S*BGPis engineered to be as lightweight as possible, and poten-tially provide additional subsidies for ISPs that secure theirstubs. (ISPs also have a local incentives to secure stubs, i.e.,to transit more revenue-generating traffic for multi-homedstubs (Section 5.1).)

3. MODELING S*BGP DEPLOYMENTWe evaluate our proposal using a model of the S*BGP

deployment process. For brevity, we now present only thedetails of our model. Justification for our modeling decisionsand possible extensions are in Section 8.

3.1 The Internetwork and entities.The AS graph. The interdomain-routing system is mod-eled with a labeled AS graph G(V,E). Each node n ∈ Vrepresents an AS, and each edge represents a physical linkbetween ASes. Per Figure 1, edges are annotated withthe standard model for business relationships in the Inter-net [13]: customer-provider (where the customer pays theprovider), and peer-to-peer (where two ASes agree to tran-sit each other’s traffic at no cost). Each AS n is also assignedweight wn, to model the volume of traffic that originates ateach AS. For simplicity, we assume ASes divide their trafficevenly across all destination ASes. However, our results arerobust even when this assumption is relaxed (Section 6.8).

We distinguish three types of ASes:

Content providers. Content providers (CPs) are ASeswhose revenue (e.g., advertising) depends on reliably deliver-ing their content to as many users as possible, rather than onproviding Internet transit. While a disproportionately largevolume of Internet traffic is known to originate at a few CPs,empirical data about Internet traffic volumes remains noto-riously elusive. Thus, based on recent research [24, 35] wepicked five content providers: Google (AS 15169), Facebook(AS 32934), Microsoft (AS 8075), Akamai (AS 20940), andLimelight (AS 22822). Then, we assigned each CP weightwCP , so that the five CPs originate an x fraction of Inter-net traffic (equally split between them), with the remaining1− x split between the remaining ASes.

Stubs. Stubs are ASes that have no customers of theirown and are not CPs. Every stub s has unit weight ws = 1.In Figure 1, ASes 34376 and 31420 are stubs.

ISPs. The remaining ASes in the graph (that are notstubs or CPs) are ISPs. ISPs earn revenue by providing In-ternet service; because ISPs typically provide transit service,rather that originating traffic (content), we assume theyhave unit weight wn = 1. In Figure 1, ASes 25076, 8866and 8928 are ISPs.

3.2 The deployment process.We model S*BGP deployment as an infinite round pro-

cess. Each round is represented with a state S, capturingthe set of ASes that have deployed S*BGP.

Initial state. Initially, the only ASes that are secure are(1) the ASes in the set of early adopters and (2) the directcustomers of the early adopter ISPs that are stubs. (Thestubs run simplex S*BGP.) All other ASes are insecure. Forexample, in Figure 1, early adopters ISP 8866 and CP 22822are secure, and stub 31420 runs simplex S*BGP because itsprovider is secure.

8928 15169 8928 15169

886622822 886622822

L d 31420 25076 31420 25076LegendPeer PeerCust Prov 34376 34376Cust Prov

Traffic34376 34376

Figure 1: Destinations (left) 31420, (right) 22822.

Each round. In each round, every ISP chooses an ac-tion (deploy S*BGP or not) that improves its utility relativeto the current state. We discuss the myopic best-responsestrategy that ISPs use to choose their actions in Section 3.3.Once an ISP becomes secure, it deploys simplex S*BGP atall its stub customers (Section 2.3). Because CPs do notearn revenues by providing Internet service, some externalincentive (e.g., concern for security, subsidies) must moti-vate them to deploy S*BGP. Thus, in our model, a CP mayonly deploy S*BGP if it is in the set of early adopters.

Once ASes choose their actions, paths are established fromevery source AS i to every destination AS d, based on thelocal BGP routing policies of each AS and the state S of theAS graph. We use a standard model of BGP routing poli-cies, based on business relationships and path length (seeAppendix A). Per Section 2.3, we also assume that routingpolicies of secure ASes require them to break ties by prefer-ring fully secure paths over insecure ones, so that the path toa given destination d depends on the state S. Paths to a des-tination d form a tree rooted at d, and we use the notationTn(d, S) to represent the subtree of ASes routing throughAS n to a destination d when the deployment process is instate S. Figure 1 (right) shows part of the routing tree fordestination 22822; notice that T8866(22822, S) contains ASes31420, 25076, 34376.

Termination. We proceed until we reach a stable state,where no ISP wants to deploy (or disable) S*BGP.

3.3 ISP utility and best response.We model an ISP’s utility as related to the volume of

traffic it transits for its customers; this captures the fact thatmany ISPs either bill their customers directly by volume,or indirectly through flat rates for fixed traffic capacities.Utility is a function of the paths chosen by each AS. Becausepath selection is a function of routing policies (Appendix A)and the state S, it follows that the utility of each ISP iscompletely determined by the AS weights, AS graph topology,and the state S.

We have two models of ISP utility that capture the waysin which an ISP can transit customer traffic:

Outgoing utility. ISP n can increase its utility by for-warding traffic to its customers. Thus, we define outgoingutility as the amount of traffic that ISP n routes to eachdestination d via a customer edge. Letting D̂(n) be the setof such destinations, we have:

un(S) =∑

Destns

d ∈ D̂(n)

∑Sources

i ∈ Tn(d, S)

wi (1)

Let’s use Figure 1 to find the outgoing utility of ISP n =8866 due to destinations 31420 and 22822. Destination 31420

is in D̂(n) but destination 22822 is not. Thus, two CPs(Google AS 15169 and Limelight 22822), and 3 other ASes(i.e., AS 8928, 25076, 34376) transit traffic through n =8866 to destination d = 31420, contributing a 2wCP + 3outgoing utility to n = 8866.

Incoming utility. An ISP n can increase its utility by for-warding traffic from its customers. Thus, we define incomingutility as the amount of traffic that ISP n receives via cus-tomer edges for each destination d. We restrict the subtreeTn(d, S) to branches that are incident on n via customer

edges to obtain the customer subtree T̂n(d, S) ⊂ Tn(d, S),we have:

un(S) =∑

Destnsd

∑Sources

i ∈ T̂n(d, S)

wi (2)

Let’s compute outgoing utility of n = 8866 due to destina-tions 31420 and 22822 in Figure 1. For destination 31420,ASes 25076 and 34376 are part of the customer subtreeT̂n(d, S), but 15169, 8928 and 22822 are not. For destina-tion d = 22822, ASes 31420, 25076, 34376 are part of thecustomer subtree. Thus, these ASes contribute 2 + 3 incom-ing utility to ISP n = 8866.

Realistically, ISP utility is some function of both of thesemodels; to avoid introducing extra parameters into our model,we consider each separately.

Myopic best response. We use a standard game-theoretic update rule known as myopic best response, thatproduces the most favorable outcome for a node in the nextround, taking other nodes’ strategies as given [16]. Let(¬Sn, S−n) denote the state when n ‘flips’ to the oppositeaction (either deploying or undeploying S*BGP) that it usedin state S, while other ASes maintain the same action theyuse in state S. ISP n changes its action in state S iff itsprojected utility un(¬Sn, S−n) is sufficiently high, i.e.,

un(¬Sn, S−n) > (1 + θ) · un(S) (3)

where θ is a threshold denoting the increase in utility anISP needs to see before it is willing to change its actions.Threshold θ captures the cost of deploying BGP security;e.g., an ISP might deploy S*BGP in a given round if S*BGPdeployment costs do not exceed θ = 5% of the profit it earnsfrom transiting customer traffic. Since θ is multiplicative,it captures the idea that deployment costs are likely to behigher at ISPs that transit more traffic. The update rule ismyopic, because it focuses on increasing ISP n’s utility inthe next round only. It is best-response because it does notrequire ISP n to speculate on other ASes’ actions in futurerounds; instead, n takes these actions as given by the currentstate S.

Discussion. Our update rule requires ASes to predicttheir future utility. In our model, ASes have full informa-tion of S and G, a common approach in game theory, whichenables them to project their utility accurately. We discussthe consequences of our update rule, and the impact of par-tial information in Sections 8.1-8.2.

4. SIMULATION FRAMEWORKComputing utility un(S) and projected utility un(¬Sn, S−n)

requires us to determine the path from every source AS toevery destination AS, for every ISP n’s unique projected

state (¬Sn, S−n). Thus, our simulations had complexityO(|V |3) on an AS graph G(V,E). To accurately simulateour model, we chose not to ‘sample down’ the complexity ofour simulations:

Projecting utility for each ISP. If we had computedthe utility for only a few sampled ISPs, this would reducethe number of available secure paths and artificially preventS*BGP deployment from progressing.

Simulations over the entire AS graph. Our pro-posal is specifically designed to leverage the extreme skewin AS connectivity (i.e., many stubs with no customers, fewTier 1s with many customers), to drive S*BGP deployment.To faithfully capture the impact of this skew, we computedutility over traffic from all sources to all destination ASes.Furthermore, we ran our simulations on the full empiricalAS graph [9], rather than a subsampled version [23], or asmaller synthetic topology [28, 39], as in prior work [8, 5].We used the Cyclops AS graph (with its inferred AS relation-ships) from Dec 9, 2010 [9], with an additional 16K peeringedges discovered at Internet exchange points (IXPs) [3], aswell as an additional peering-heavy AS graph described inSection 6.8.

The AS graph G(V,E) had |V | = 36K; to run O(|V |3)-simulations at such a scale, we parallelized our algorithms ona 200-node DryadLINQ cluster [38] that could run througha single simulation in 1-12 hours. (Details of our implemen-tation are in the full version.)

5. CASE STUDY: S*BGP DEPLOYMENTWe start by showing that even a small set of early adopters

can create enough market pressure to transition the vastmajority of ASes to S*BGP.

Case study overview. We focus on a single simulationwhere the early adopters are the five CPs (Google, Facebook,Microsoft, Limelight, Akamai, see Section 3.1), and the topfive Tier 1 ASes in terms of degree ( Sprint (1239), Verizon(701), AT&T (7018), Level 3 (3356), Cogent (174)). EveryISP uses an update rule with a relatively low threshold θ =5%, that the five CPs originate x = 10% of the traffic inthe Internet, and that stubs do break ties in favor of secureroutes. We now show how even a small set of ten earlyadopters (accounting for less that 0.03% of the AS graph)can convince 85% of ASes to deploy S*BGP, and secure 65%of all paths in the AS graph.

5.1 Competition drives deployment.We start by zooming in on S*BGP deployment at two

competing ISPs, in a scenario we call a Diamond.

Figure 5: Two ISPs, AS 8359 and AS 13789, competefor traffic from Sprint (AS 1239) to their stub customer, AS18608. Sprint is an early adopter of S*BGP, and initiallythe three other ASes are insecure. Both ISPs offer Sprintequally good two-hop customer paths to the stub, and AS8359 is chosen to carry traffic by winning the tie break. Inthe first round, AS 13789 computes its projected utility, andrealizes it can gain Sprint’s traffic by adopting S*BGP andupgrading its stub to simplex S*BGP. (See Section 8.2 formore discussion on how ISPs compute projected utility.) Bythe fourth round, AS 8359 has lost so much utility (due totraffic lost to ASes like 13789) that he decides to deployS*BGP.

●

●

●

●

●●

● ●

●

●

●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0 5 10 15 20 25 30

010

0030

0050

00

round

num

ber

of A

Ses

that

dep

loy

S*B

GP

●

●

●

●

●●

● ●

●

●

●● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● all ASesISPs

Figure 2: The number of ASes thatdeploy S*BGP each round.

0 5 10 15 20 25 30

0.95

1.05

1.15

1.25

round

curr

ent u

tility

/util

ity a

t sta

rt

●●

●●

●

●

● ●●

●

●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● AS 8359AS 6731AS 8342

Figure 3: Normalized utility ofISPs in Fig. 5 and 6.

0 5 10 15 20 25 30

0.8

0.9

1.0

1.1

1.2

round

med

ian

(nor

mal

ized

util

ity)

1+θ=1.05

utility beforeprojected utility

Figure 4: Projected and actualutility before deploying S*BGPnormalized by starting utility.

18608

13789

Sprint

8359

Round 0

18608

13789

Sprint

8359

Round 1

18608

13789

Sprint

8359

Round 4

Figure 5: A Diamond: ISPs 13789 and 8359 competefor traffic from Sprint (AS 1239).

Of course, Figure 5 is only a very small snapshot of thecompetition for traffic destined to a single stub AS 18608;utility for each ISPs is based on customer traffic transitedto all destinations in the AS graph. Indeed, this Diamondscenario is quite common. We counted more than 6.5K in-stances of the Diamond, each involving two ISPs, a stub,and one of our early adopters.

5.2 Global deployment dynamics.Figure 2: We show the number of ASes (i.e., stubs,ISPs and CPs) and the number of ISPs that deploy S*BGPat each round. In the first round, 548 ISPs become secure;because each of these ISPs deploy simplex S*BGP in theirstubs, we see that over 5K ASes become secure by the endof the first round. In subsequent rounds, hundreds of ISPsdeploy S*BGP in each round; however, the number of newlysecure stubs drops dramatically, suggesting that many ISPsdeploy S*BGP to regain traffic lost when their stubs weresecured by competitors. After the 17th iteration, the pro-cess tapers off, with fewer than 50 ASes becoming secure ineach round. The final surge in deployment occurs in round25, when a large AS, 6939, suddenly became secure, caus-ing a total of 436 ASes to deploy S*BGP in the remainingsix rounds. When the process terminates, 85% of ASes aresecure, including 80% of the 6K ISPs in the AS graph.

5.3 Longer secure paths sustain deployment.In Figure 2 we observed rapid, sustained deployment of

S*BGP in the first 17 iterations. This happens becauselonger secure paths are created as more ASes deploy S*BGP,thus creating incentives for S*BGP at ASes that are far awayfrom the early adopters:

Figure 6: We once again encounter AS 8359 from Figure 5.We show how AS 8359’s decision to deploy S*BGP in round4 allows a new ISP (AS 6371) to compete for traffic. In round5 AS 6731 sees a large increase in utility by becoming secure.

8342

Sprint

8359

30733 6731

50197

Round 4

8342

Sprint

8359

30733 6731

50197

Round 5Figure 6: A newly created four-hop secure path.

This occurs, in part, because AS 6371 can now entice six ofthe early adopters to route through him on a total of 69newly-secure paths. Indeed, when AS 6731 becomes secure,he continues the chain reaction set in motion by AS 8359;for instance, in round 7 (not shown), AS 6371’s neighbor AS41209 becomes secure in order to offer Sprint a new, securefour-hop path to one of 41209’s own stubs.

5.4 Keeping up with the competition.Two behaviors drive S*BGP deployment in a Diamond.

First, an ISP becomes secure to steal traffic from a com-petitor, and then the competitor becomes secure in order toregain the lost traffic. We can watch this happening for theISPs from Figure 5 and 6:

Figure 3: We show the utilities of ISPs 8359, 6731, and8342 in each round, normalized by starting utility i.e., theutility before the deployment process began (when all ASes,including the early adopters, were still insecure). As we sawin Figure 5, AS 8359 deploys S*BGP in round 4 in order toregain traffic he lost to his secure competitors; here we seethat in round 4, AS 8359 has lost 3% of his starting utility.Once AS 8359 deploys S*BGP, his utility jumps up to morethan 125% of his starting utility, but these gains in utility areonly temporary, disappearing around round 15. The same istrue in round 6 for AS 6371 from Figure 6. By round 15, 60%ISPs in the AS graph are already secure (Figure 2), and ourISPs can no longer use security to differentiate themselves,causing their utility to return to within 3% of their startingutility.

This is also true more generally:

Figure 4: For each round i, we show the median utilityand median projected utility for ISPs that become secure inround i+1, each normalized by starting utility. (Recall from(3) that these ISPs have projected utility at least 1+θ timestheir utility in round i.) In the first 9 rounds, ISPs mainly

deploy S*BGP to steal traffic from competitors; that is, theirprojected utility in the round before they deploy S*BGP isat least 1 + θ = 105% times their starting utility. However,as deployment progresses, ASes increasingly deploy S*BGPin order to recover lost traffic and return to their startingutility; that is, in rounds 10-20 ISP utility drops to at leastθ = 5% less than starting utility, while projected utilityapproaches starting utility (y=1).

5.5 Is S*BGP deployment a zero-sum game?Our model of S*BGP deployment is indeed a zero-sum

game; we assume that ISPs compete over a fixed set of cus-tomer traffic. Thus, when the vast majority of ASes havedeployed S*BGP, ISPs can no longer use security to distin-guish themselves their from competitors (Figure 3). At thetermination of this case study, only 8% of ISPs have an in-crease in utility of more than θ = 5% over their startingutility. On the other hand, 85% of ASes now benefit from a(mostly) secure Internet. Furthermore, like ASes 8359 and6731 in Figure 3, many of these secure ASes enjoyed a pro-longed period of increased utility that could potentially helpdefray the costs of deploying S*BGP.

It is better to deploy S*BGP. One might argue thata cynical ISP might preempt the process by never deploy-ing S*BGP. However, a closer look shows that its almostalways in the ISPs interest to deploy S*BGP. ISPs that de-ploy S*BGP usually return to their starting utility or slightlyabove, whereas ISPs that do not deploy S*BGP lose trafficin the long term. For instance, AS 8342 in Figure 6 neverdeploys S*BGP. As shown in Figure 3, when the deploymentprocess terminates, AS 8342 has lost 4% of its starting util-ity. Indeed, another look at the data (not shown) shows thatthe ISPs that remain insecure when the process terminateslose on average 13% of their starting utility!

6. CHOOSING EARLY ADOPTERSNext, we consider choosing the set of ASes that should be

targeted to become early adopters of S*BGP.

6.1 It’s hard to choose early adopters.Ideally, we would like to choose the optimal set of early

adopters that could cause the maximum number of otherASes to deploy S*BGP. We show that this is NP-hard bypresenting a reduction to the ‘set cover’ problem (proof inthe full version):

Theorem 6.1. For an AS graph G(V,E) and a parameter1 ≤ k ≤ |V |, finding a set of early adopter ASes of size kthat maximizes the number of ASes that are secure when thedeployment process terminates is NP-hard. Approximatingthe solution within a constant factor is also NP-hard.

As such, we use simulations3 of the deployment process toinvestigate heuristic approaches for choosing early adopters,including AS degree (e.g., Tier 1s) and volume of trafficoriginated by an AS (e.g., content providers).

6.2 The parameter space.We consider how the choice of early adopters is impacted

by assumptions on (1) whether or not stubs running simplex

3Since there is no sampling involved, there is no variabilitybetween simulations run with the same set of parameters.

S*BGP break ties based on security, (2) the AS graph, and(3) traffic volumes sourced by CPs.

Outgoing utility. Also, recall that we have two modelsof ISP utility (Section 3.3). In this section, we dive intothe details of the outgoing utility model because it has thefollowing very nice property:

Theorem 6.2. In the outgoing utility model, a secure nodewill never have an incentive to turn off S*BGP.

As a consequence of this theorem (proof in the full version),it immediately follows that (a) every simulation must ter-minate, and (b) we can significantly reduce compute timeby not computing projected utility for ISPs that are alreadysecure. (We discuss complications that arise from the in-coming utility model in Section 7.)

Deployment threshold θ. Our update rule (3) is suchthat ISPs change their actions if they can increases utilityby at least θ. Thus, to gain insight into how ‘difficult’ itis to convince ISPs to deploy S*BGP, we assume that eachISP uses the same threshold θ, and sweep through differentvalues of θ (but see also Section 8.2).

6.3 Comparing sets of early adopters.We next explore the influence of different early adopters:

Figure 7 (top): We show the fraction of ASes thatadopt S*BGP for different values of θ. We consider no earlyadopters, the top 5-200 ISPs in terms of degree, the fiveCPs, five CPs in combination with the top five ISPs, and200 random ISPs.

There are incentives to deploy S*BGP. For low valuesof θ < 5%, we observe that there is sufficient competitionover customer traffic to transition 85% of ASes to S*BGP.Moreover, this holds for almost every set of early adopters weconsidered. (Note that in the unrealistic case where θ = 0,we see widespread S*BGP deployment even with no earlyadopters, because we assume the stubs break ties in favorof secure paths. But see also Section 6.7.) Furthermore, wefind that the five CPs have approximately the same amountof influence as the case where there are no early adopters;we investigate this in more detail in Section 6.8.

Some ISPs always remain insecure. We find 20%of the 6K ISPs in the AS graph [9, 3] never deploy S*BGP,because they are never subject to competition for customertraffic. This highlights two important issues: (1) some ISPsmay never become secure (e.g., ASes whose customers areexclusively single-homed) (2) S*BGP and BGP will coexistin the long term.

Choice of early adopters is critical. For higher valuesof θ ≥ 10%, it becomes important to choose ISPs with highcustomer degree as early adopters. In fact, Figure 7 showsa set of 200 random ASes has significantly lower influencethan a set containing only the five top ASes in terms ofdegree. For large values of θ ≥ 30%, a larger set of high-degree early adopters is required, with the top 200 ASes interms of degree causing 53% of the ASes to deploy S*BGPfor θ = 50%. However, to put this observation in someperspective, recall that θ = 30% suggests that the cost ofS*BGP deployment exceeds 30% of an ISP’s profit marginfrom transiting customer traffic.

0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.2

0.4

0.6

0.8

1.0

θ

frac

tion

of A

Ses

that

dep

loy

S*B

GP

● top 200top 100top 105 CP+top 5

top 5200 random5 CPnone

0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.2

0.4

0.6

0.8

1.0

θ

frac

tion

of IS

Ps

that

dep

loy

S*B

GP

● top 200top 100top 105 CP+top 5

top 5200 random5 CPnone

Figure 7: Fraction of ASes (top) and ISPs (bottom)that deploy S*BGP for varying θ and early adopters.

6.4 How much security do we get?We count the number of secure paths at the end of the

deployment process, as a measure of the efficacy of S*BGPdeployment. (Of course, this is not a perfect measure ofthe AS graph’s resiliency to attack; quantifying this requiresapproaches similar to [14, 8], an important direction for fu-ture work.) We find that the fraction of secure path is onlyslightly lower than f2, where f is the fraction of ASes thathave deployed S*BGP (figure in the full version). (The f2

follows from the fact that for a path to be secure, both itssource AS and its destination AS must be secure.)

6.5 Market pressure vs. simplex S*BGPThe cause of for global S*BGP deployment differs for low

and high values of the deployment threshold θ:

Figure 7 (bottom): We show the fraction of ISPs (notASes) that deploy S*BGP for the early adopter sets andvarying values of θ. For low values of θ, market pressuredrives a large fraction of ISPs to deploy S*BGP. In contrast,for higher values of θ very few ISPs deploy S*BGP, even forlarge sets of well-connected early adopters. In these cases,most of the deployment is driven by ISPs upgrading theirstub customers to simplex S*BGP. For example, for the top200 ISPs, when θ = 50%, only a small fraction of secureASes (4%) deploy S*BGP because of market pressure, thevast majority (96%) are stubs running simplex S*BGP.

6.6 The source of competition: tie break sets.Recall that the tiebreak set is the set of paths on which

an AS employs the security criterion to select paths to adestination AS (Section 2.2.2). A tiebreak set with multiplepaths presents opportunities for ISPs to compete over trafficfrom the source AS.

We observe that tiebreak sets are typically very small inthe AS graph under the routing policies of Appendix A (fig-ure in the full version). Moreover, only 20% tiebreak setscontain more than a single path.

This striking observation suggests that even a very limitedamount of competition suffices to drive S*BGP deploymentfor low θ.

6.7 Stubs don’t need to break ties on security.So far, we have focused on the case where secure stubs

break ties in favor of secure paths. Indeed, given that stubstypically make up the majority of secure ASes, one mightexpect that their routing decisions can have a major impactof the success of the S*BGP deployment process. Surpris-ingly, we find that this is not the case. Indeed, our resultsare insensitive to this assumption, for θ > 0 and regardlessof the choice of early adopter (Figure shown in full version).We explain this by observing that stubs both (a) have smalltiebreak sets, and (b) transit no traffic.

Security need only effect a fraction of routing deci-sions! Thus, only 15% of ASes (i.e., the ISPs) need tobreak ties in favor of secure routes, and only 23% of ISPtiebreak sets contain more than one path. Combining theseobservations, we find that S*BGP deployment can progresseven if only 0.15 × 0.23 = 3.5% of routing decisions are ef-fected by security considerations!

6.8 Robustness to traffic and connectivity

6.8.1 Varying parameters.To understand the sensitivity of our results we varied thefollowing parameters:

1. Originated traffic volumes. We swept throughdifferent values x = {10%, 20%, 33%, 50%} for the fractionof traffic originated by the five CPs (Section 3.1); recentwork suggests a reasonable range is x =10-20% [24] .

2. Traffic destinations. Initially, we assume ASes uni-formly spread their traffic across all potential destinations.We test the robustness of our results to this assumption bymodeling traffic locality. We model locality by assumingASes send traffic proportional to 1/k to destination ASesthat are k hops away.

3. Connectivity of content providers. Published AS-level topologies are known to have poor visibility into peer-ing links at the edge of the AS-level topology [31]. This isparticularly problematic for CPs, who in recent years, haveshifted towards peering with many other ASes to cut downcontent delivery costs [12] . Indeed, while the CPs knownto have short path lengths [32], their average path length inour AS graph (with routing policies as in Appendix A) was2.7 hops or more. Thus, for sensitivity analysis, we created apeering-heavy AS graph with 19.7K artificial peering edgesfrom the five CPs to 80% of ASes found to be present atIXPs [3]. In our augmented AS graph, the average pathlength of the CPs dropped to about 2, and their degree in-creased to be as high as the largest Tier 1 ISPs.

Akamai

9498

29144755

24 stubs

45210

45210

Akamai

9498

29144755

24 stubs

45210

45210

UNDEPLOY Example!!

Figure 8: AS 4755 incentives turn off S*BGP.

6.8.2 Impact of traffic volumes and connectivityWe now present an overview of our model’s robustness

(additional detail in the full version):

1. Originated traffic volumes vs. degree. Surprisingly,when the five CPs source x = 10% of traffic, they are muchless effective as early adopters than the top five Tier 1 ASes.Even though in the augmented topology the Tier 1s and CPshave about equal degree, the dominant factor here is traffic;even though the CPs originate 10% of traffic, the Tier 1sstill transit 2-9X times more traffic.

2. Localized interdomain traffic. We validate that ourresults are robust to localized interdomain traffic using the5 CP and top 5 as early adopters. For both the original andaugmented topology, our results are robust even when ASesdirect most of their traffic to nearby destinations.

3. Impact of peering-heavy structure on simplex S*BGP.Even in the augmented topology, where the CPs peer withlarge number of ASes, the Tier 1s consistently out performthe CPs by immediately upgrading their stub customers tosimplex S*BGP. This suggests that having CPs to upgradetheir stub peers to simplex S*BGP could potentially driveS*BGP deployment further.

6.9 Summary and recommendations.We make two key observations regarding selection of early

adopters. First, only a small number of ISPs suffice as earlyadopters when deployment thresholds θ are small. Second,to withstand high θ, Tier 1 ASes should be targeted. This isdue to the high volumes of traffic they transit and the manystubs they upgrade to simplex S*BGP. Finally, we note thatour results hold even if more than 96% of routing decisionsare insensitive to security considerations!

7. OTHER COMPLICATIONSIntuition suggests that a secure ISP will observe increased

utility because secure ASes transit traffic through it. Whilethis is true in the outgoing utility model (Theorem 6.2), itturns out that this is not the case for the incoming util-ity model. We now discuss complications that might arisebecause we require S*BGP to play a role in route selection.

7.1 Buyer’s Remorse: Turning off S*BGP.We present an example of a severe obstacle to S*BGP

deployment: an secure ISP that has incentive to turn offS*BGP. The idea here is that when an ISP n becomes secure,some of n’s incoming traffic might change its path, and entern’s network along peer/provider edges instead of customeredges, thus reducing n’s utility. This causes the secure ISP’sutility to satisfy Equation 3, resulting in the ISP opting toundeploy S*BGP.

Figure 8: We show that AS 4755, a Telecom providerin India, has an incentive to turn off S*BGP in its network.We assume content providers have wCP = 821 which corre-

sponds to 10% of Internet traffic originating at the big fiveCPs (including Akamai’s AS 20940).

In the state S on the left, Akamai, AS 4755, and NTT(AS 2914) are secure, the stub customers of these two secureISPs run simplex S*BGP, and all other ASes are insecure.Here, AS 4755 transits traffic sourced by Akamai from hisprovider NTT AS 2914, to a collection of twenty-four of itsstub customers (including AS 45210). Akamai’s traffic doesnot increase AS 4755’s utility because it arrives at AS 4755along a provider edge.

In the state (¬S4755, S−4755) on the right, AS 4755 turnsS*BGP off. If we assume that stubs running simplex S*BGPdo not break ties based on security, then the only ASes thatcould potentially change their routes are the secure ASes20940 and 2914. Notice that when AS 4755 turns S*BGPoff, Akamai’s AS 20940 has no secure route to AS 4755’sstub customers (including AS 45210). As such, Akamai willrun his usual tie break algorithms, which in our simulationcame up in favor of AS 9498, a customer of AS 4755. BecauseAkamai’s traffic is now enters AS 4755 on customer edges,AS 4755’s incoming utility increases by a factor of 205% pereach of the 24 stub destinations.

Turning off the entire network. Our simulations con-firmed that, apart from Akamai changing its chosen paththese twenty-four stubs, all other ASes use the same routesin state S and state (¬S4755, S−4755). This means that AS4755 has an incentive to turn off S*BGP in his entire net-work ; no routes other than those ones Akamai uses to reachthe twenty-four stubs are impacted by his decision. Indeed,we found that the utility of AS 4755 increase by a total of0.5% (over all destinations) when he turns off S*BGP!

Turning off a destination. AS 4775 could just as wellturn off S*BGP on a per destination basis, i.e., by refusing topropagate S*BGP announcements for the twenty-four stubsin Figure 8, and sending insecure BGP messages for thesedestinations instead.

7.2 Turning off S*BGP can cause oscillations.To underscore the seriousness of an ISP turning off S*BGP

in his entire network, we now argue that a group of ISPscould oscillate, alternating between turning S*BGP on andoff, and never arriving at a stable state. In the full version,we exhibit an example AS graph and state S that provesthat oscillations could exist. Worse yet, we show that itis hard to even determine whether or not the deploymentprocess will oscillate!

Theorem 7.1. Given an AS graph and state S, it is PSPACE-complete to decide if the deployment process will terminateat a stable state in the incoming utility model.

Our proof, in the full version is by reduction to the PSPACE-complete problem of determining whether a space-boundedTuring Machine will halt for a given input string. The com-plexity class PSPACE consists of all decisions problems thatcan be solved using only polynomial space, but in unboundedtime. PSPACE-complete problems (intuitively, the hard-est problems in PSPACE) are at least as hard as the NP-complete problems, and widely believed to be even harder.

7.3 How common are these examples?At this point, the reader may be wondering how often an

AS might have incentives to turn off S*BGP.

Turning off an entire network? Figure 8 proves thatcases where an ISP has an incentive to turn off S*BGP inits entire network do exist in realistic AS-level topologies[9]. However, we speculate that such examples will occurinfrequently in practice. While we cannot provide any con-crete evidence of this, our speculation follows from the factthat an ISP n obtains utility from many destinations. Thus,even if n has increased its utility by turning OFF S*BGPfor destinations that are part of subgraphs like Figure 8, hewill usually obtain higher utility by turning ON S*BGP forthe other destinations that are not part of such subgraphs.(In Figure 8, this does not happen because the state S issuch that only a very small group of ASes are secure; thus,no routes other than the ones pictured are effected by AS4755’s decision to turn off S*BGP.)

Turning off a destination is likely. On the other hand,it is quite easy to find examples of specific destinations forwhich an ISP might want to turn off S*BGP. Indeed, a searchthrough the AS graph found that at least 10% of the 5,992ISPs could find themselves in a state where they have incen-tives to turn off S*BGP for at least one destination!

8. DISCUSSION OF OUR MODELThe wide range of parameters involved in modeling S*BGP

deployment means that our model (Section 3) cannot bepredictive of S*BGP deployment in practice. Instead, ourmodel was designed to (a) capture a few of the most crucialissues that might drive S*BGP deployment, while (b) takingthe approach that simplicity is preferable to complexity.

8.1 Myopic best response.For simplicity, we used a myopic best-response update rule

that is standard in the game-theory literature [16]. In Sec-tion 5.5, we discussed the consequences of the fact that ISPsonly act to improve their utility in the next round, ratherthan in long run. Another potential issue is that our updaterule ignores the possibility that multiple ASes could deployS*BGP in the transition from a round i to round i + 1,resulting in the gap between the projected utility, and theactual utility in the subsequent round. Fortunately, our sim-ulations show projected utility un(¬Sn, S−n) is usually anexcellent estimate of actual utility in the subsequent round.For example, in the case study of Section 5, 80% of ISPsoverestimate their utility by less than 2%, 90% of ISPs over-estimate by less than 6.7%. In the full version, we presentadditional results that the show that this observation alsoholds more generally across simulations.

8.2 Computing utility locally.Because we lack information about interdomain traffic

flows in the Internet, our model uses weighted counts ofthe subtrees of ASes routing through ISP n as a stand-infor traffic volumes, and thus ISP utility. While computingthese subtrees in our model requires global information thatwould be unavailable to the average ISP (e.g., the state S,the AS graph topology, routing policies), in practice, an ISPcan just compute its utility by locally observing traffic flowsthrough its network.

Computing projected utility. Computing projectedutility un(¬Sn, S−n) in practice is significantly more com-plex. While projected utility gives an accurate estimateof actual utility when it is computed using global informa-

tion, ISPs may inaccurately estimate their projected utilitywhen using only local information. Our model can accom-modate these inaccuracies by rolling them into the deploy-ment threshold θ. (That is, if projected utility is off by afactor of ±ε, model this with threshold θ ± ε.) Thus, whileour approach was to sweep through a common value of θ forevery ISP (Section 6.2), extensions might capture inaccu-rate estimates of projected utility by randomizing θ, or evenby systematically modeling an ISP’s estimation process toobtain a measure for how it impacts θ.

Practical mechanisms for projecting future trafficpatterns. Because S*BGP deployment can impact routeselection, it is crucial to develop mechanisms that allow ISPspredict how security will impact traffic patterns through it’snetwork. Moreover, if ISPs could use such mechanisms toestimate projected utility, they would also be an importantdriver for S*BGP deployment. For example, an ISP mightset up a router that listens to S*BGP messages from neigh-boring ASes, and then use these message to predict howbecoming secure might impact its neighbors’ route selec-tions. A more sophisticated mechanism could use extended“shadow configurations” with neighboring ASes [1] to gainvisibility into how traffic flows might change.

8.3 Alternate routing policies and actions.Routing policies. Because our model of ISP utilitydepends on traffic volumes (Section 3.3), we need to a modelfor how traffic flows in the Internet. In practice, traffic flowis determined by the local routing policies used by each AS,which are arbitrary and not publicly known. Thus, we use astandard model of routing policies (Appendix A) based onbusiness relationship and path length [14, 6].

Routing policies are likely to impact our results by de-termining (a) AS path lengths (longer AS paths mean itis harder to secure routes), and (b) tiebreak set size (Sec-tion 6.6). For example, we speculate that considering short-est path routing policy would lead to overly optimistic re-sults; shortest-path routing certainly leads to shorter ASpaths, and possibly also to larger tiebreak sets. On the otherhand, if a large fraction of multihomed ASes always use oneprovider as primary and the other as backup (irrespective ofthe AS path lengths etc.) then our current analysis is likelyto be overly optimistic. (Of course, modeling this is difficultgiven a dearth of empirical data on backup paths).

Choosing routing policies. An AS might cleverlychoose its routing policies to maximize utility. However, thefollowing suggests that this is intractable:

Theorem 8.1. When all other ASes’ routing policies areas in Appendix A, it is NP hard for any AS n to find therouting policy that maximizes its utility (in both the incomingand outgoing utility models). Moreover, approximating theoptimal routing policy within any constant factor is also NPhard.

The proof (in the full version) shows that this is NP-hardeven if n has a single route to the destination, and must onlychoose the set of neighbors to which it announces the route.(Thus, the problem is tractable when the node’s neighborsset is of constant size.)

Lying and cheating. While it is well known that an AScan increase the amount of traffic it transits by manipulatingits BGP messages [7], we avoided this issue because our focus

is on technology adoption by economically-motivated ASes,not BGP manipulations by malicious or misconfigured ASes.

9. RELATED WORKSocial networks. The diffusion of new technologiesin social networks has been well studied in economics andgame theory (e.g., [30, 21] and references therein). Theidea that players will myopically best-respond if their utilityexceeds a threshold is standard in this literature (cf., ourupdate rule (3)). However, in a social network, a player’sutility depends only on its immediate neighbors, while inour setting it depends on the set of secure paths. Thus,while [21] finds approximation algorithms for choosing anoptimal set of early adopters, this is NP-hard in our setting(Theorem 6.1).

Protocol adoption in the Internet. The idea thatcompetition over customer traffic can drive technology adop-tion in the Internet has appeared in many places in the lit-erature [10, 33]. Ratnasamy et al. [33] suggest using com-petition for customer traffic to drive protocol deployment(e.g., IPv6) at ISPs by creating new mechanisms for direct-ing traffic to ASes with IPv6. Leveraging competition ismuch simpler with S*BGP, since it directly influences rout-ing decisions without requiring adoption of new mechanisms.

Multiple studies [19, 18, 36] consider the role of converters(e.g., IPv4-IPv6 gateways) on protocol deployment. WhileS*BGP must certainly be backwards compatible with BGP,the fact that security guarantees only hold for fully-securepaths (Section 2.2.2) means that there is no reason to con-vert BGP messages to S*BGP messages. Thus, we do notexpect converters to drive S*BGP deployment.

S*BGP adoption. Perhaps most relevant is Chang etal.’s comparative study on the adoptability of secure inter-domain routing protocols [8]. Like [8], we also consider howearly adopters create local incentives for other ASes to de-ploy S*BGP. However, our study focuses on how S*BGP de-ployment can be driven by (a) simplex S*BGP deploymentat stubs, and (b) the requirement that security plays a role inrouting decisions. Furthermore, in [8] ISP utility depends onthe security benefits offered by the partially-deployed pro-tocol. Thus, the utility function in [8] depends on possibleattacker strategies (i.e., path shortening attacks) and at-tacker location (i.e., random, or biased towards small ISPs).In contrast, our model of utility is based solely on economics(i.e., customer traffic transited). Thus, we show that globalS*BGP deployment is possible even if ISPs’ local deploy-ment decisions are not driven by security concerns. Also,complementary to our work is [5]’s forward-looking proposalthat argues that extra mechanisms (e.g., secure data-planemonitoring) can be added to S*BGP to get around the prob-lem of partially-secure paths (Appendix B). Finally, we noteboth our work and [5, 8] find that ensuring that Tier 1 ASesdeploy S*BGP is crucial, a fact that is not surprising in lightof the highly-skewed degree distribution of the AS graph.

10. CONCLUSIONOur results indicate that there is hope for S*BGP de-

ployment. We have argued for (1) simplex S*BGP to securestubs, (2) convincing but a small, but influential, set of ASesto be early adopters of S*BGP, and (3) ensuring that S*BGPinfluences traffic by requiring ASes to (at minimum) breakties between equally-good paths based on security.

We have shown that, if deployment cost θ is low, ourproposal can successfully transition a majority of ASes toS*BGP. The transition is driven by market pressure createdwhen ISPs deploy S*BGP in order draw revenue-generatingtraffic into their networks. We also pointed out unexploredchallenges that result from S*BGP’s influence of route se-lection (e.g., ISPs may have incentives to disable S*BGP).

We hope that this work motivates the standardization andresearch communities to devote their efforts along three keylines. First, effort should be spent to engineer a lightweightsimplex S*BGP. Second, with security impacting route selec-tion, ISPs will need tools to forecast how S*BGP deploymentwill impact traffic patterns (e.g., using “shadow configura-tions”, inspired by [1], with cooperative neighboring ASes)so they can provision their networks appropriately. Finally,our results suggest that S*BGP and BGP will coexist in thelong term. Thus, effort should be devoted to ensure thatS*BGP and BGP can coexist without introducing new vul-nerabilities into the interdomain routing system.

AcknowledgmentsThis project was motivated by discussions with the membersof the DHS S&T CSD Secure Routing project. We especiallythank the group for the ideas about simplex S*BGP, andSteve Bellovin for the example in Appendix B.

We are extremely grateful to Mihai Budiu, Frank McSh-erry and the rest of the group at Microsoft Research SVCfor helping us get our code running on DryadLINQ. We alsothank Edwin Guarin and Bill Wilder for helping us get ourcode running on Azure, and the Microsoft Research NewEngland lab for supporting us on this project. We thankAzer Bestavros, John Byers, Mark Crovella, Jef Guarente,Vatche Ishakian, Isaac Keslassy, Eric Keller, Leo Reyzin,Jennifer Rexford, Rick Skowyra, Renata Texiera, WalterWillinger and Minlan Yu for comments on drafts of thiswork. This project was supported by NSF Grant S-1017907and a gift from Cisco.

11. REFERENCES[1] R. Alimi, Y. Wang, and Y. R. Yang. Shadow configuration

as a network management primitive. In Sigcomm, 2008.

[2] Anonymized. Let the market drive deployment: A strategyfor transitioning to bgp security. Full version. Technicalreport, 2011.

[3] B. Augustin, B. Krishnamurthy, and W. Willinger. IXPs:Mapped? In IMC, 2009.

[4] R. Austein, G. Huston, S. Kent, and M. Lepinski. Secureinter-domain routing: Manifests for the resource public keyinfrastructure. draft-ietf-sidr-rpki-manifests-09.txt, 2010.

[5] I. Avramopoulos, M. Suchara, and J. Rexford. How smallgroups can secure interdomain routing. Technical report,Princeton University Comp. Sci., 2007.

[6] H. Ballani, P. Francis, and X. Zhang. A study of prefixhijacking and interception in the Internet. In SIGCOMM,2007.

[7] K. Butler, T. Farley, P. McDaniel, and J. Rexford. Asurvey of BGP security issues and solutions. Proceedings ofthe IEEE, 2010.

[8] H. Chang, D. Dash, A. Perrig, and H. Zhang. Modelingadoptability of secure BGP protocol. In SIGCOMM, 2006.

[9] Y.-J. Chi, R. Oliveira, and L. Zhang. Cyclops: The InternetAS-level observatory. ACM SIGCOMM CCR, 2008.

[10] D. D. Clark, J. Wroclawski, K. R. Sollins, and R. Braden.Tussle in cyberspace: defining tomorrow’s Internet. Trans.on Networking, 2005.

[11] J. Cowie. Rensys blog: China’s 18-minute mystery.http://www.renesys.com/blog/2010/11/chinas-18-minute-mystery.shtml.

[12] A. Dhamdhere and C. Dovrolis. The internet is flat:Modeling the transition from a transit hierarchy to apeering mesh. In CoNEXT, 2010.

[13] L. Gao and J. Rexford. Stable Internet routing withoutglobal coordination. Trans. on Networking, 2001.

[14] S. Goldberg, M. Schapira, P. Hummon, and J. Rexford.How secure are secure interdomain routing protocols. InSigcomm, 2010.

[15] T. Griffin, F. B. Shepherd, and G. Wilfong. The stablepaths problem and interdomain routing. Trans. onNetworking, 2002.

[16] S. Hart. Adaptive heuristics. Econometrica, 2005.

[17] IETF. Secure interdomain routing (SIDR) working group.http://datatracker.ietf.org/wg/sidr/charter/.

[18] Y. Jin, S. Sen, R. Guerin, K. Hosanagar, and Z. Zhang.Dynamics of competition between incumbent and emergingnetwork technologies. In NetEcon, 2008.

[19] D. Joseph, N. Shetty, J. Chuang, and I. Stoica. Modelingthe adoption of new network architectures. In CoNEXT,2007.

[20] J. Karlin, S. Forrest, and J. Rexford. Autonomous securityfor autonomous systems. Computer Networks, oct 2008.

[21] D. Kempe, J. Kleinberg, and E. Tardos. Maximizing thespread of influence through a social network. In ACMSIGKDD, 2003.

[22] S. Kent, C. Lynn, and K. Seo. Secure border gatewayprotocol (S-BGP). JSAC, 2000.

[23] V. Krishnamurthy, M. Faloutsos, M. Chrobak, L. Lao, J.-H.Cui, and A. G. Percus. Sampling large internet topologiesfor simulation purposes. Computer Networks (Elsevier),51(15):4284–4302, 2007.

[24] C. Labovitz, S. Iekel-Johnson, D. McPherson, J. Oberheide,and F. Jahanian. Internet inter-domain traffic. InSIGCOMM, 2010.

[25] M. Lad, D. Massey, D. Pei, Y. Wu, B. Zhang, and L. Shang.Phas: Prefix hijack alert system. In Usenix Security, 2006.

[26] M. Lepinski and S. Turner. Bgpsec protocol specification,2011. http://tools.ietf.org/html/draft-lepinski-bgpsec-overview-00.

[27] C. D. Marsan. U.S. plots major upgrade to Internet routersecurity. Network World, 2009.

[28] A. Medina, A. Lakhina, I. Matta, and J. Byers. BRITE: anapproach to universal topology generation. In MASCOTS,2001.

[29] S. Misel. “Wow, AS7007!”. Merit NANOG Archive, apr1997. http://www.merit.edu/mail.archives/nanog/1997-04/msg00340.html.

[30] S. Morris. Contagion. Review of Economics Studies, 2003.[31] R. Oliveira, D. Pei, W. Willinger, B. Zhang, and L. Zhang.

Quantifying the completeness of the observed internetAS-level structure. UCLA Computer Science Department -Techical Report TR-080026-2008, Sept 2008.

[32] F. Orbit. http://www.fixedorbit.com/metrics.htm.

[33] S. Ratnasamy, S. Shenker, and S. McCanne. Towards anevolvable Internet architecture. In SIGCOMM, 2005.

[34] Rensys Blog. Pakistan hijacks YouTube.http://www.renesys.com/blog/2008/02/pakistan_hijacks_youtube_1.shtml.

[35] Sandvine. Fall 2010 global internet phenomena, 2010.

[36] S. Sen, Y. Jin, R. Guerin, and K. Hosanagar. Modeling thedynamics of network technology adoption and the role ofconverters. Trans. on Networking, 2010.

[37] R. White. Deployment considerations for secure origin BGP(soBGP). draft-white-sobgp-bgp-deployment-01.txt, June2003, expired.

[38] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson,P. K. Gunda, and J. Currey. Dryadlinq: a system for

p p

rq rq

sm sm

v v

Figure 9: A new attack vector.

general-purpose distributed data-parallel computing using ahigh-level language. In Usenix OSDI, 2008.

[39] E. Zegura, K. Calvert, and S. Bhattarcharjee. How tomodel an internetwork. In Infocom, 1996.

APPENDIXA. A MODEL OF ROUTING WITH BGP.

We follow [15] by assuming that each AS a computes pathsto a given destination AS d based a ranking on outgoingpaths, and an export policy specifing the set of neighbors towhich a given path should be announced.

Rankings. AS a selects a path to d from the set of simplepaths it learns from its neighbors as follows:

LP Local Preference. Paths are ranked based on theirnext hop: customer is chosen over peer which is chosenover provider.

SP Shortest Paths. Among the paths with the highestlocal preference, prefer the shortest ones.

SecP Secure Paths. If there are multiple such paths,and node a is secure, then prefer the secure paths.

TB Tie Break. If there are multiple such paths, node abreaks ties: if b is the next hop on the path, choose thepath where hash, H(a, b) is the lowest.4

This standard model of local preference [13] captures theidea that an AS has incentives to prefer routing through acustomer (that pays it) over a peer (no money is exchanged)over a provider (that it must pay).

Export Policies. This standard model of export policiescaptures the idea that an AS will only load its network withtransit traffic if its customer pays it to do so [13]:

GR2 AS b announces a path via AS c to AS a iff at leastone of a and c are customers of b.

B. ATTACKS ON PARTIALLY SECURE PATHSWe show how preferring partially secure paths over in-

secure paths can introduce new attack vectors that do notexist even without S*BGP:

Figure 9: Suppose that only ASes p and q are secure, andthat malicious AS m falsely announces the path (m, v), andsuppose that p’s tiebreak algorithm prefers paths throughr over paths through q. Then, p has a choice between twopaths; a partially-secure false path (p, q,m, v), and and in-secure true path (p, r, s, v). If no AS used S*BGP, p wouldhave chosen the true path (per his tiebreak algorithm); if pprefers partially secure paths, he will be fooled into routingto AS m.

4In practice, this is done using the distance between routersand router IDs. Since we do not incorporate this informationin our model we use a randomized tie break which preventscertain ASes from “always winning”.

Understanding Network Failures in Data Centers:Measurement, Analysis, and ImplicationsPhillipa Gill

University of [email protected]

Navendu JainMicrosoft Research

[email protected]

Nachiappan NagappanMicrosoft Research

[email protected]

ABSTRACTWe present the first large-scale analysis of failures in a data cen-ter network.Through our analysis, we seek to answer several fun-damental questions: which devices/links are most unreliable, whatcauses failures, how do failures impact network traffic and how ef-fective is network redundancy? We answer these questions usingmultiple data sources commonly collected by network operators.The key findings of our study are that(1) data center networksshow high reliability, (2) commodity switches such as ToRs andAggS are highly reliable, (3) load balancers dominate in terms offailure occurrences with many short-lived software related faults,(4) failures have potential to cause loss of many small packets suchas keep alive messages and ACKs, and (5) network redundancy isonly 40% effective in reducing the median impact of failure.

Categories and Subject Descriptors:C.2.3 [Computer-Comm-unication Network]: Network Operations

General Terms:Network Management, Performance, Reliability

Keywords: Data Centers, Network Reliability

1. INTRODUCTIONDemand fordynamic scalingand benefits from economies of

scale are driving the creation of mega data centers to host a broadrange of services such as Web search, e-commerce, storage backup,video streaming, high-performance computing, and data analytics.To host these applications, data center networks need to be scalable,efficient, fault tolerant, and easy-to-manage. Recognizing this need,the research community has proposed several architectures to im-prove scalability and performance of data center networks [2, 3, 12–14, 17, 21]. However, the issue of reliability has remained unad-dressed, mainly due to a dearth of available empirical data on fail-ures in these networks.

In this paper, we study data center network reliability by ana-lyzing network error logs collected for over a year from thousandsof network devices across tens ofgeographically distributed datacenters.Our goals for this analysis are two-fold. First, we seekto characterize network failure patterns in data centers and under-stand overall reliability of thenetwork. Second, we want to leverage

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SIGCOMM’11, August 15-19, 2011, Toronto, Ontario, Canada.Copyright 2011 ACM 978-1-4503-0797-0/11/08 ...$10.00.

lessons learned from this study to guide the design of future datacenter networks.

Motivated by issues encountered by network operators, westudy network reliability along three dimensions:

• Characterizing the most failure prone network elements.Toachieve high availability amidst multiple failure sources such ashardware, software, and human errors, operators need to focuson fixing the most unreliable devices and links in the network.To this end, we characterize failures to identify network elementswith high impact on network reliability e.g., those that fail withhigh frequency or that incur high downtime.

• Estimating the impact of failures. Given limited resources athand, operators need to prioritizesevere incidents for troubleshoot-ing based on their impact to end-users and applications. In gen-eral, however, it is difficult to accurately quantify a failure’s im-pact from error logs, and annotations provided by operators introuble tickets tend to be ambiguous. Thus, as a first step, weestimate failure impact by correlating event logs with recent net-work traffic observed on links involved in the event. Note thatlogged events donot necessarily result in a service outage be-cause of failure-mitigation techniques such as network redun-dancy [1] and replication of compute and data [11, 27], typicallydeployed in data centers.

• Analyzing the effectiveness of network redundancy.Ideally,operators want to mask all failures before applications experi-ence any disruption. Current data center networks typically pro-vide 1:1 redundancy to allow traffic to flow along an alternateroute when a device or link becomes unavailable [1]. However,this redundancy comes at a high cost—both monetary expensesand management overheads—to maintain a large number of net-work devicesand linksin the multi-rooted tree topology. To ana-lyze its effectiveness, we compare traffic on a per-link basis dur-ing failure events to traffic across all links in the network redun-dancy group where the failure occurred.

For our study, we leverage multiple monitoring tools put inplace by our network operators. We utilize data sources that pro-vide both a static view (e.g., router configuration files, device pro-curement data) and a dynamic view (e.g., SNMP polling, syslog,trouble tickets) of the network. Analyzing these data sources, how-ever, poses several challenges. First, since these logs track low levelnetwork events, they do not necessarily imply application perfor-mance impact or service outage. Second, we need to separate fail-ures that potentially impact network connectivity from high volumeand often noisy network logs e.g., warnings and error messageseven when the device is functional. Finally, analyzing the effec-tiveness of network redundancy requires correlating multiple data

sources across redundant devices and links. Through our analysis,we aim to address these challenges to characterize network fail-ures, estimate the failure impact, and analyze the effectiveness ofnetwork redundancy in data centers.

1.1 Key observationsWe make several key observations from our study:

• Data center networks are reliable.We find that overall the datacenter network exhibits high reliability with more than four 9’sof availability for about 80% of the links and for about 60% ofthe devices in the network (Section 4.5.3).

• Low-cost, commodity switches are highly reliable. We findthat Top of Rack switches (ToRs) and aggregation switches ex-hibit the highest reliability in the network with failure ratesofabout 5% and 10%, respectively. This observation supports net-work design proposals that aim to build data center networksusing low cost, commodity switches [3, 12, 21] (Section 4.3).

• Load balancers experience a high number of software faults.We observe 1 in 5 load balancers exhibit a failure (Section 4.3)and that they experience many transient software faults (Sec-tion 4.7).

• Failures potentially cause loss of a large number of smallpackets. By correlating network traffic with link failure events,we estimate the amount of packets and data lost during failures.We find that most failures lose a large number of packets rela-tive to the number of lost bytes (Section 5), likely due to loss ofprotocol-specific keep alive messages or ACKs.

• Network redundancy helps, but it is not entirely effective.Ideally, network redundancy should completely mask all fail-ures from applications. However, we observe that network re-dundancy is only able to reduce the median impact of failures (interms of lost bytes or packets) by up to 40% (Section 5.1).

Limitations. As with any large-scale empirical study, our resultsare subject to several limitations. First, the best-effort nature of fail-ure reporting may lead to missed events or multiply-logged events.While we perform data cleaning (Section 3) to filter the noise, someevents may still be lost due to software faults (e.g., firmware errors)or disconnections (e.g., under correlated failures). Second, humanbias may arise in failure annotations (e.g., root cause). This concernis alleviated to an extent by verification with operators, and scaleand diversity of our network logs. Third, network errors do not al-ways impact network traffic or service availability, due to severalfactors such as in-built redundancy at network, data, and applica-tion layers. Thus, our failure rates should not be interpreted as im-pacting applications. Overall, we hope that this study contributes toa deeper understanding of network reliability in data centers.

Paper organization. The rest of this paper is organized as follows.Section 2 presents our network architecture and workload charac-teristics. Data sources and methodology are described in Section 3.We characterize failures over a year within our data centers in Sec-tion 4. We estimate the impact of failures on applications and theeffectiveness of network redundancy in masking them in Section 5.Finally we discuss implications of our study for future data centernetworks in Section 6. We present related work in Section 7 andconclude in Section 8.

Internet

LBLB

LB LB

Layer 2

Layer 3Data center

Internet

ToR ToR ToRToR

AggSAggS

AccRAccR

CoreCore

} primary andback up

Figure 1: A conventional data center network architectureadapted from figure by Cisco [12]. The device naming conven-tion is summarized in Table 1.

Table 1: Summary of device abbreviationsType Devices Description

AggS AggS-1, AggS-2 Aggregation switchesLB LB-1, LB-2, LB-3 Load balancersToR ToR-1, ToR-2, ToR-3 Top of Rack switchesAccR - Access routersCore - Core routers

2. BACKGROUNDOur study focuses on characterizing failure events within our

organization’s set of data centers. We next give an overview of datacenter networks and workload characteristics.

2.1 Data center network architectureFigure 1 illustrates an example of a partial data center net-

work architecture[1]. In the network, rack-mounted servers areconnected (or dual-homed) to a Top of Rack (ToR) switch usu-ally via a 1 Gbps link. The ToR is in turn connected to a primaryand back up aggregation switch (AggS) for redundancy. Each re-dundant pair of AggS aggregates traffic from tens of ToRs whichis then forwarded to the access routers (AccR). The access routersaggregate traffic from up to several thousand servers and route it tocore routers that connect to the rest of the data center network andInternet.

All links in our data centers use Ethernet as the link layerprotocol and physical connections are a mix of copper and fibercables. The servers are partitioned into virtual LANs (VLANs) tolimit overheads (e.g., ARP broadcasts, packet flooding) and to iso-late different applications hosted in the network. At each layer ofthe data center network topology, with the exception of a subset ofToRs,1:1 redundancyis built into the network topology to miti-gate failures. As part of our study, we evaluatethe effectiveness ofredundancy in masking failures when one (or more) componentsfail, and analyze how the tree topology affects failure characteris-tics e.g., correlated failures.

In addition to routers and switches, our network contains manymiddle boxes such as load balancers and firewalls. Redundant pairsof load balancers (LBs) connect to each aggregation switch and

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Daily 95th percentile utilization

P[X

<x]

TRUNKLBMGMT

COREISCIX

Overall

Figure 2: The daily 95th percentile utilization as computed us-ing five-minute traffic averages (in bytes).

Table 2: Summary of link typesType Description

TRUNK connect ToRs to AggS and AggS to AccRLB connect load balancers to AggSMGMT management interfacesCORE connect routers (AccR, Core) in the network coreISC connect primary and back up switches/routersIX connect data centers (wide area network)

perform mapping between static IP addresses (exposed to clientsthrough DNS) and dynamic IP addresses of the servers that processuser requests. Some applications require programming the load bal-ancers and upgrading their software and configuration to supportdifferent functionalities.

Network composition. The device-level breakdown of our net-work is as follows. ToRs are the most prevalent device type in ournetwork comprising approximately three quarters of devices. LBsare the next most prevalent at one in ten devices. The remaining15% of devices are AggS, Core and AccR. We observe the effectsof prevalent ToRs in Section 4.4, where despite being highly re-liable, ToRs account for a large amount of downtime. LBs on theother hand account for few devices but are extremely failure prone,making them a leading contributor of failures (Section 4.4).

2.2 Data center workload characteristicsOur network is used in the delivery of many online applica-

tions. As a result, it is subject to many well known properties ofdata center traffic; in particular the prevalence of a large volumeof short-lived latency-sensitive “mice” flows and a few long-livedthroughput-sensitive“elephant” flowsthat make up the majority ofbytes transferred in the network.These properties have also beenobserved by others [4, 5, 12].

Network utilization. Figure 2 shows thedaily 95th percentile uti-lization as computed using five-minute traffic averages (in bytes).We divide links into six categories based on their role in the net-work (summarized in Table 2). TRUNK and LB links which re-side lower in the network topology are least utilized with 90% ofTRUNK links observing less than 24% utilization. Links higher inthe topology such as CORE links observe higher utilization with90% of CORE links observing less than 51% utilization. Finally,links that connect data centers (IX) are the most utilized with35%observing utilization of more than 45%. Similar to prior studiesof data center network traffic [5], we observe higher utilization atupper layers of the topology as a result of aggregationand high

bandwidth oversubscription [12]. Note that since the traffic mea-surement is at the granularity of five minute averages, it is likely tosmooth the effect of short-lived traffic spikes on link utilization.

3. METHODOLOGY AND DATA SETSThe network operators collect data from multiple sources to

track the performance and health of the network. We leverage theseexisting data sets in our analysis of network failures. In this section,we first describe the data sets and then the steps we took to extractfailures of network elements.

3.1 Existing data setsThe data sets used in our analysis are a subset of what is col-

lected by the network operators.We describe these data sets in turn:

• Network event logs (SNMP/syslog).We consider logs derivedfrom syslog, SNMP traps and polling, collected by our networkoperators. The operators filter the logs to reduce the number oftransient events and produce a smaller set ofactionable events.One of the filtering rules excludes link failures reported by serversconnected to ToRs as these links are extremely prone to spuriousport flapping (e.g., more than 100,000 events per hour across thenetwork). Of the filtered events, 90% are assigned to NOC ticketsthat must be investigated for troubleshooting. These event logscontain information about what type of network element expe-rienced the event, what type of event it was, a small amount ofdescriptive text (machine generated) and an ID number for anyNOC tickets relating to the event. For this study we analyzed ayear’s worth of events from October 2009 to September 2010.

• NOC Tickets. To track the resolution of issues, the operatorsemploy a ticketing system. Tickets contain information aboutwhen and how events were discovered as well as when they wereresolved. Additional descriptive tags are applied to tickets de-scribing the cause of the problem, if any specific device was atfault, as well as a “diary” logging steps taken by the operators asthey worked to resolve the issue.

• Network traffic data. Data transferred on network interfaces islogged using SNMP polling. This data includes five minute aver-ages of bytes and packets into and out of each network interface.

• Network topology data. Given the sensitive nature of networktopology and device procurement data, we used a static snapshotof our network encompassing thousands of devices and tens ofthousands of interfaces spread across tens of data centers.

3.2 Defining and identifying failuresWhen studying failures, it is important to understand what

types of logged events constitute a “failure”. Previous studies havelooked at failures as defined by pre-existing measurement frame-works such as syslog messages [26], OSPF [25, 28] or IS-IS listen-ers [19]. These approaches benefit from a consistent definition offailure, but tend to be ambiguous when trying to determine whethera failure had impact or not. Syslog messages in particular can bespurious with network devices sending multiple notifications eventhough a linkis operational. For multiple devices, we observed thistype of behavior after the device was initially deployed and therouter software went into an erroneous state. For some devices, thiseffect was severe, with one device sending 250 syslog “link down”eventsper hour for 2.5 months (with no impact on applications)before it was noticed and mitigated.

We mine network event logs collected over a year to extractevents relating to device and link failures.Initially, we extract all

logged “down” events for network devices and links.This leads usto define two types of failures:

Link failures: A link failure occurs when the connection betweentwo devices (on specific interfaces) is down.These events are de-tected by SNMP monitoring on interface stateof devices.

Device failures: A device failure occurs when the device is notfunctioning for routing/forwarding traffic. These events can be causedby a variety of factors such as a device being powered down formaintenance or crashing due to hardware errors.

We refer to each logged event as a “failure” to understand theoccurrence of low level failure events in our network. As a result,we may observe multiple component notifications related to a sin-gle high level failure or a correlated event e.g., a AggS failure re-sulting in down events for its incident ToR links. We also correlatefailure events with network traffic logs to filterfailures with im-pact that potentially result in loss of traffic (Section 3.4); we leaveanalyzing application performance and availability under networkfailures, to future work.

3.3 Cleaning the dataWe observed two key inconsistencies in the network event

logs stemming from redundant monitoring servers being deployed.First, a single element (link or device) may experience multiple“down” events simultaneously. Second, an element may experienceanother down event before the previous down event has been re-solved. We perform two passes of cleaning over the data to resolvethese inconsistencies. First, multiple down events on the same ele-ment that start at the same time are grouped together. If they do notend at the same time, the earlier of theirendtimes is taken. In thecase of down events that occur for an element that is already down,we group these events together, taking the earliest down time of theevents in the group. For failures that are grouped in this way wetake the earliest end time for the failure. We take the earliest failureend times because of occasional instances where events were notmarked as resolved until long after their apparent resolution.

3.4 Identifying failures with impactAs previously stated, one of our goals is to identify failures

that potentially impact end-users and applications. Since we didnot have access to application monitoring logs, wecannot preciselyquantify application impact such as throughput lossor increased re-sponse times. Therefore, we insteadestimate the impact of failureson network traffic.

To estimate traffic impact, we correlate each link failure withtraffic observed on the link in the recent past before the time offailure. We leverage five minute traffic averages for each link thatfailed and compare the median traffic on the link in the time win-dow preceding the failure event and the median traffic during thefailure event. We say a failure has impacted network traffic if themedian traffic during the failure is less than the traffic before thefailure. Since many of the failures we observe have short durations(less than ten minutes) and our polling interval is five minutes, wedo not require that traffic on the link go down to zero during thefailure. We analyze the failure impact in detail in Section 5.

Table 3 summarizes the impact of link failures we observe.We separate links that were transferring no data before the failureinto two categories, “inactive” (no data before or during failure) and“provisioning” (no data before, some data transferred during fail-ure). (Note that these categories are inferred based only on trafficobservations.) The majority of failures we observe are on links thatareinactive(e.g., a new device being deployed), followed by linkfailures with impact. We also observe a significant fraction of link

Table 3: Summary of logged link eventsCategory Percent Events

All 100.0 46,676Inactive 41.2 19,253Provisioning 1.0 477No impact 17.9 8,339Impact 28.6 13,330No traffic data 11.3 5,277

failure notifications where no impact was observed(e.g., devicesexperiencing software errors at the end of the deployment process).

For link failures, verifying that the failure caused impact tonetwork traffic enables us to eliminate many spurious notificationsfrom our analysis and focus on events that had a measurable impacton network traffic. However, since we do not have application levelmonitoring, we are unable to determine if these events impactedapplications or if there were faults that impacted applications thatwe did not observe.

For device failures, we perform additional steps to filter spuri-ous failure messages (e.g., down messages caused by software bugswhen the device is in fact up). If a device is down, neighboring de-vices connected to it will observe failures on inter-connecting links.For each device down notification, we verify that at least one linkfailure with impact has been noted for links incident on the devicewithin a time window of five minutes. This simple sanity checksig-nificantly reduces the number of device failures we observe.Notethat if the neighbors of a device fail simultaneously e.g., due to acorrelated failure, we may not observe a link-down message for thatdevice.

For the remainder of our analysis, unless stated otherwise, weconsider only failure events that impacted network traffic.

4. FAILURE ANALYSIS

4.1 Failure event panoramaFigure 3 illustrates how failures are distributed across our mea-

surement period and across data centers in our network. It showsplots for links that experience at least one failure, both for all fail-ures and those with potential impact; the y-axis is sorted by datacenter and the x-axis is binned by day. Each point indicates that thelink (y) experienced at least one failure on a given day (x).

All failures vs. failures with impact. We first compare the view ofall failures (Figure 3(a)) to failures having impact (Figure 3(b)).Links that experience failures impacting network traffic are onlyabout one third of the population of links that experience failures.We do not observe significant widespread failures in either plot,with failures tending to cluster within data centers, or even on in-terfaces of a single device.

Widespread failures: Vertical bands indicate failures that werespatially widespread. Upon further investigation, we find that thesetend to be related to software upgrades. For example, the verticalband highlighted in Figure 3(b) was due to an upgradeof load bal-ancer software that spanned multiple data centers. In the case ofplanned upgrades, the network operators are able to take precau-tions so that the disruptions do not impact applications.

Long-lived failures: Horizontal bands indicate link failures on acommon link or device over time. These tend to be caused by prob-lems such as firmware bugs or device unreliability (wider bandsindicate multiple interfaces failed on a single device). We observehorizontal bands with regular spacing between link failure events.In one case, these events occurred weekly and were investigated

0

2000

4000

6000

8000

10000

12000

Oct-09 Dec-09 Feb-10 Apr-10 Jul-10 Sep-10

Lin

ks

sort

ed

by

da

ta c

en

ter

Time (binned by day)

(a) All failures

0

2000

4000

6000

8000

10000

12000

Oct-09 Dec-09 Feb-10 Apr-10 Jul-10 Sep-10

Lin

ks

sort

ed

by

da

ta c

en

ter

Time (binned by day)

(b) Failures with impact

Figure 3: Overview of all link failures (a) and link failures with impact on network traffic (b) on links with at least one failure.

0.039 0.045 0.030 0.027 0.020

0.219

0.005

0.111

0.214

0

0.05

0.1

0.15

0.2

0.25

ToR-1 ToR-2 ToR-3 AggS-2 ToR-5 LB-2 ToR-4 AggS-1 LB-1

Pro

ba

bil

ity

of

fail

ure

Device type

Figure 4: Probability of device failure in one year for devicetypes with population size of at least 300.

0.054

0.026

0.095

0.176

0.095

0.028

0

0.05

0.1

0.15

0.2

0.25

TRUNK MGMT CORE LB ISC IX

Pro

ba

bil

ity

of

fail

ure

Link type

Figure 5: Probability of a failure impacting network traffic inone yearfor interface types with population size of at least 500.

in independent NOC tickets. As a result of the time lag, the op-erators did not correlate these events and dismissed each notifica-tion as spurious since they occurred in isolation and did not impactperformance. This underscores the importance of network healthmonitoring tools that track failures over time and alert operators tospatio-temporalpatterns which may not be easily recognized usinglocal views alone.

Table 4: Failures per time unitFailures per day: Mean Median 95% COV

Devices 5.2 3.0 14.7 1.3Links 40.8 18.5 136.0 1.9

4.2 Daily volume of failuresWe now consider the daily frequency of failures of devices

and links. Table 4 summarizes the occurrences of link and devicefailures per day during our measurement period. Links experienceabout an order of magnitude more failures than devices. On a dailybasis, device and link failures occur with high variability, havingCOV of 1.3 and 1.9, respectively.(COV > 1 is considered highvariability.)

Link failures are variable and bursty. Link failures exhibit highvariability in their rate of occurrence. We observedbursts of link

failures caused by protocol issues (e.g., UDLD [9]) and device is-sues (e.g., power cycling load balancers).

Device failures are usually caused by maintenance.While de-vice failures are less frequent than link failures, they also occur inbursts at the daily level. We discovered that periods with high fre-quency of device failures are caused by large scale maintenance(e.g., on all ToRs connected to a common AggS).

4.3 Probability of failureWe next consider the probability of failure for network ele-

ments.This value is computed by dividing the number of devicesof a given type that observe failures by the total device populationof the given type. This gives the probability of failure in our oneyear measurement period.We observe (Figure 4) that in terms ofoverall reliability, ToRs have the lowest failure rates whereas LBshave the highest failure rate. (Tables 1 and 2 summarize the abbre-viated link and device names.)

Load balancers have the highest failure probability. Figure 4shows the failure probability for device types with population sizeof at least 300. In terms of overall failure probability, load balancers(LB-1, LB-2) are the least reliable with a 1 in 5 chance of expe-riencing failure. Since our definition of failure can include inci-dents where devices are power cycled duringplanned maintenance,we emphasize here that not all of these failures are unexpected.Our analysis of load balancer logs revealed several causes of these

38%

28%

15%

9%

4% 4% 2%

18%

66%

5% 8%

0.4% 0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

LB-1 LB-2 ToR-1 LB-3 ToR-2 AggS-1

Pe

rce

nta

ge

Device type

failures

downtime

Figure 6: Percent of failures and downtime per device type.

58%

26%

5% 5% 5% 1%

70%

9%

2% 6%

12%

1%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

TRUNK LB ISC MGMT CORE IX

Pe

rce

nta

ge

Link type

failures

downtime

Figure 7: Percent of failures and downtime per link type.

Table 5: Summary of failures per device (for devices that expe-rience at least one failure).

Device type Mean Median 99% COV

LB-1 11.4 1.0 426.0 5.1LB-2 4.0 1.0 189.0 5.1ToR-1 1.2 1.0 4.0 0.7LB-3 3.0 1.0 15.0 1.1ToR-2 1.1 1.0 5.0 0.5AggS-1 1.7 1.0 23.0 1.7

Overall 2.5 1.0 11.0 6.8

transient problems such as software bugs, configuration errors, andhardware faults related to ASIC and memory.

ToRs have low failure rates. ToRs have among the lowest fail-ure rate across all devices.This observation suggests that low-cost,commodity switches are not necessarily less reliable than their ex-pensive, higher capacity counterparts and bodes wellfor data cen-ter networking proposals that focus on using commodity switchesto build flat data center networks [3, 12, 21].

We next turn our attention to the probability of link failures atdifferentlayersin our network topology.

Load balancer links have the highest rate of logged failures.Figure 5 shows the failure probability for interface types with apopulation size of at least 500. Similar to our observation with de-vices, links forwarding load balancer traffic are most likely to ex-perience failures (e.g., as a result of failures on LB devices).

Links higher in the network topology (CORE) and links con-necting primary and back up of the same device (ISC) are the sec-ond most likely to fail, each with an almost 1 in 10 chance of fail-ure.However, these events are more likely to be masked by networkredundancy (Section 5.2).In contrast, links lower in the topology(TRUNK) only have about a 5% failure rate.

Management and inter-data center links have lowest failurerate. Links connecting data centers (IX) and for managing deviceshave high reliability with fewer than 3% of each of these link typesfailing. This observation is important because these links are themost utilized and least utilized, respectively (cf. Figure 2). Linksconnecting data centers are critical to our network and hence backup links are maintained to ensure that failure of a subset of linksdoes not impact the end-to-end performance.

4.4 Aggregate impact of failuresIn the previous section, we considered the reliability of indi-

vidual links and devices. We next turn our attention to the aggregateimpact of each population in terms of total number of failure events

and total downtime. Figure 6 presents the percentage of failures anddowntime for the different device types.

Load balancers have the most failures but ToRs have the mostdowntime. LBs have the highest number of failures of any devicetype. Of our top six devices in terms of failures, half are load bal-ancers. However, LBs do not experience the most downtime whichis dominated instead by ToRs. This is counterintuitive since, as wehave seen, ToRs have very low failure probabilities. There are threefactors at play here: (1) LBs are subject to more frequent softwarefaults and upgrades (Section 4.7) (2) ToRs are the most prevalentdevice type in the network (Section 2.1), increasing their aggregateeffect on failure events and downtime (3) ToRs are not a high pri-ority component for repair because of in-built failover techniques,such as replicating data and compute across multiple racks, that aimto maintain high service availability despite failures.

We next analyze the aggregate number of failures and down-time for network links. Figure 7 shows the normalized number offailures and downtime for the six most failure prone link types.

Load balancer links experience many failure events but rela-tively small downtime. Load balancer links experience the secondhighest number of failures, followed by ISC, MGMT and CORElinks which all experience approximately 5% of failures. Note thatdespite LB links being second most frequent in terms of numberof failures, they exhibit less downtime than CORE links (which, incontrast, experience about 5X fewer failures). This result suggeststhat failures for LBs are short-lived and intermittent caused by tran-sient software bugs, rather than more severe hardware issues. Weinvestigate these issues in detail in Section 4.7.

We observe that the total number of failures and downtimeare dominated byLBs and ToRs, respectively. We next considerhow many failures each element experiences. Table 5 shows themean, median, 99th percentile and COV for the number of failuresobserved per device over a year (for devices that experience at leastone failure).

Load balancer failures dominated by few failure prone devices.We observe that individual LBs experience a highly variable num-ber of failures with a few outlier LB devices experiencing morethan 400 failures. ToRs, on the other hand, experience little vari-ability in terms of the number of failures with most ToRs experi-encing between 1 and 4 failures. We make similar observations forlinks, where LB links experience very high variability relative toothers (omitted due to limited space).

4.5 Properties of failuresWe next consider the properties of failures for network ele-

ment types that experienced the highest number of events.

1e+02 1e+03 1e+04 1e+05 1e+06

0.0

0.2

0.4

0.6

0.8

1.0

Time to repair (s)

P[X

<x]

LB−1LB−2ToR−1LB−3ToR−2AggS−1Overall

5 min 1 hour1 day 1 week


(a) Time to repair for devices

1e+00 1e+02 1e+04 1e+06

0.0

0.2

0.4

0.6

0.8

1.0

Time between failures (s)

P[X

<x]


5 min

1 hour 1 day 1 week

(b) Time between failures for devices

1e+02 1e+03 1e+04 1e+05 1e+06

0.0

0.2

0.4

0.6

0.8

1.0

Annual downtime (s)

P[X

<x]


two 9’s

three 9’s

four 9’s

five 9’s


(c)Annualized downtime for devices

Figure 8: Properties of device failures.

1e+02 1e+03 1e+04 1e+05 1e+06

0.0

0.2

0.4

0.6

0.8

1.0

Time to repair (s)

P[X

<x]

TRUNKLBMGMTCOREISCIXOverall

5 min 1 hour 1 day 1 week


(a) Time to repair for links

1e+00 1e+02 1e+04 1e+06

0.0

0.2

0.4

0.6

0.8

1.0

Time between failures (s)

P[X

<x]


5 min1 hour 1 day 1 week


(b) Time between failures for links

1e+02 1e+03 1e+04 1e+05 1e+06

0.0

0.2

0.4

0.6

0.8

1.0

Annual downtime (s)

P[X

<x]


two 9’sthree 9’s

four 9’s

five 9’s


(c) Annualized downtime for links

Figure 9: Properties of link failures that impacted network traffic.

4.5.1 Time to repairThis section considers the time to repair (or duration) for fail-

ures, computed as the time between a down notification for a net-work element and when it is reported as being back online. It isnot always the case that an operator had to intervene to resolve thefailure. In particular, for short duration failures, it is likely that thefault was resolved automatically (e.g., root guard in the spanningtree protocol can temporarily disable a port [10]). In the case oflink failures, our SNMP polling interval of four minutes results in agrouping of durations around four minutes (Figure 9 (a)) indicatingthat many link failures are resolved automatically without operatorintervention. Finally, for long-lived failures, the failure durationsmay be skewed by when the NOC tickets were closed by networkoperators.For example, some incident tickets may not be termedas ’resolved’ even if normal operation has been restored, until ahardware replacement arrives in stock.

Load balancers experience short-lived failures. We first lookat the duration of device failures. Figure 8 (a) shows the CDF oftime to repair for device types with the most failures. We observethat LB-1 and LB-3 load balancers experience the shortest failureswith median time to repair of 3.7 and 4.5 minutes, respectively,indicating that most of their faults are short-lived.

ToRs experience correlated failures.When considering time torepair for devices, we observe a correlated failure pattern for ToRs.Specifically, these devices tend to have several discrete “steps” inthe CDF of their failure durations. These steps correspond to spikesin specific duration values. On analyzing the failure logs, we findthat these spikes are due to groups of ToRs that connect to the same(or pair of) AggS going down at the same time (e.g., due to main-tenance or AggS failure).

Inter-data center links take the longest to repair. Figure 9 (a)shows the distribution of time to repair for different link types. Themajority of link failures are resolved within five minutes, with theexception of links between data centers which take longer to re-pair. This is because links between data centers require coordina-tion between technicians in multiple locations to identify and re-solve faults as well as additional time to repair cables that may bein remote locations.

4.5.2 Time between failuresWe next consider the time between failure events. Since time

between failure requires a network element to have observed morethan a single failure event, this metric is most relevant to elementsthat are failure prone. Specifically, note that more than half of allelements have only a single failure(cf. Table 5), so the devices andlinks we consider here are in the minority.

Load balancer failures are bursty. Figure 8 (b) shows the distri-bution of time between failures for devices. LBs tend to have theshortest time between failures, with a median of 8.6 minutes and16.4 minutes for LB-1 and LB-2, respectively. Recall that failureevents for these two LBs are dominated by a small number of de-vices that experience numerous failures(cf. Table 5). This smallnumber of failure prone devices has a high impact on time betweenfailure, especially since more than half of the LB-1 and LB-2 de-vices experience only a single failure.

In contrast to LB-1 and LB-2, devices like ToR-1 and AggS-1have median time between failure of multiple hours and LB-3 hasmedian time between failure of more than a day. We note that theLB-3 device is a newer version of the LB-1 and LB-2 devices andit exhibits higher reliability in terms of time between failures.

Link flapping is absent from the actionable network logs. Fig-ure 9 (b) presents the distribution of time between failures for thedifferent link types. On an average, link failures tend to be sepa-rated by a period of about one week. Recall that our methodologyleverages actionable information, as determined by network oper-ators. This significantly reduces our observations of spurious linkdown events and observations of link flapping that do not impactnetwork connectivity.

MGMT, CORE and ISC links are the most reliable in termsof time between failures, with most link failures on CORE and ISClinks occurring more than an hour apart. Links between data cen-ters experience the shortest time between failures. However, notethat links connecting data centers have a very low failure probabil-ity. Therefore, while most links do not fail, the few that do tend tofail within a short time period of prior failures. In reality, multipleinter-data center link failures in close succession are more likely tobe investigatedas part of the same troubleshooting windowby thenetwork operators.

4.5.3 Reliability of network elementsWe conclude our analysis of failure properties by quantifying

the aggregate downtime of network elements. We define annualizeddowntime as the sum of the duration of all failures observed by anetwork element over a year. For link failures, we consider failuresthat impacted network traffic, but highlight that a subset of thesefailures are due to planned maintenance. Additionally, redundancyin terms of network, application, and data in our system implies thatthis downtimecannot be interpreted as a measure of application-level availability. Figure 8 (c) summarizes the annual downtime fordevices that experienced failures during our study.

Data center networks experience high availability.With the ex-ception of ToR-1 devices, all devices have a median annual down-time of less than 30 minutes. Despite experiencing the highest num-ber of failures, LB-1 devices have the lowest annual downtime. Thisis due to many of their failures being short-lived. Overall, devicesexperience higher than four 9’s of reliability with the exceptionof ToRs, where long lived correlated failures cause ToRs to havehigher downtime; recall, however, that only 3.9% of ToR-1s expe-rience any failures (cf. Figure 4).

Annual downtime for the different link types are shown in Fig-ure 9 (c). The median yearly downtime for all link types, with theexception of links connecting data centers is less than 10 minutes.This duration is smaller than the annual downtime of 24-72 minutesreported by Turneret al. when considering an academic WAN [26].Links between data centers are the exception because, as observedpreviously, failures on links connecting data centers take longer toresolve than failures for other link types. Overall, links have highavailability with the majority of links (except those connecting datacenters) having higher than four 9’s of reliability.

4.6 Grouping link failuresWe now consider correlations between link failures. We also

analyzed correlated failures for devices, but except for a few in-stances of ToRs failing together, grouped device failures are ex-tremely rare (not shown).

To group correlated failures, we need to define what it meansfor failures to be correlated. First, we require that link failures occurin the same data center to be considered related (since it can be thecase that links in multiple data centers fail close together in time butare in fact unrelated). Second, we require failures to occur within apredefined time threshold of each other to be considered correlated.When combining failures into groups, it is important to pick anappropriate threshold for grouping failures. If the threshold is too

1 2 5 10 20 50 100 200

0.0

0.2

0.4

0.6

0.8

1.0

Size of link failure group (links)

P[X

<x]

Figure 10: Number of links involved in link failure groups.

Table 6: Examples of problem typesProblem Type Example causes or explanations

Change Device deployment, UPS maintenanceIncident OS reboot (watchdog timer expired)Network Connection OSPF convergence, UDLD errors,

Cabling, Carrier signaling/timing issuesSoftware IOS hotfixes, BIOS upgradeHardware Power supply/fan, Replacement of

line card/chassis/optical adapterConfiguration VPN tunneling, Primary-backup failover,

IP/MPLS routing

small, correlated failures may be split into many smaller events. Ifthe threshold is too large, many unrelated failures will be combinedinto one larger group.

We considered the number of failures for different thresholdvalues. Beyond grouping simultaneous events, which reduces thenumber of link failures by a factor of two, we did not see significantchanges by increasing the threshold.

Link failures tend to be isolated. The size of failure groups pro-duced by our grouping method is shown in Figure 10. We see thatjust over half of failure events are isolated with 41% of groups con-taining more than one failure. Large groups of correlated link fail-ures are rare with only 10% of failure groups containing more thanfour failures. We observed two failure groups with the maximumfailure group size of 180 links. These were caused by scheduledmaintenance tomultiple aggregation switches connected to a largenumber of ToRs.

4.7 Root causes of failuresFinally, we analyze the types of problems associated with de-

vice and link failures. We initially tried to determine the root causeof failure events by mining diaries associated with NOC tickets.However, the diaries often considered multiple potential causes forfailure before arriving at the final root cause, which made min-ing the text impractical. Because of this complication, we choseto leverage the “problem type” field of the NOC tickets which al-lows operators to place tickets into categories based on the cause ofthe problem. Table 6 gives examples of the types of problems thatare put into each of the categories.

Hardware problems take longer to mitigate. Figure 11 consid-ers the top problem types in terms of number of failures and totaldowntime for devices. Software and hardware faults dominate in

3%

15%

5%

31% 33%

5% 4% 5% 6%

13%

72%

1%

0%

10%

20%

30%

40%

50%

60%

70%

80%

Change Incident Net.

Conn.

SW HW Config

Pe

rce

nta

ge

Problem type

failures

downtime

Figure 11: Device problem types.

4%

8%

26%

14%

20%

11%

15%

3%

33%

7%

38%

3%

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

Change Incident Net.

Conn.

SW HW Config

Pe

rce

nta

ge

Problem type

failures

downtime

Figure 12: Link problem types.

1e−03 1e−01 1e+01 1e+03 1e+05

0.0

0.2

0.4

0.6

0.8

1.0

Packets lost (millions)

P[X

<x]

Figure 13: Estimated packet loss during failure events.

1e−03 1e−01 1e+01 1e+03 1e+05

0.0

0.2

0.4

0.6

0.8

1.0

Traffic loss (GB)

P[X

<x]

Figure 14: Estimated traffic loss during failure events.

terms of number of failures for devices. However, when consider-ing downtime, the balance shifts and hardware problems have themost downtime. This shift between the number of failures and thetotal downtime may be attributed to software errors being allevi-ated by tasks that take less time to complete, such as power cy-cling, patching or upgrading software. In contrast, hardware errorsmay require a device to be replaced resulting in longer repair times.

Load balancers affected by software problems.We examinedwhat types of errors dominated for the most failure prone devicetypes (not shown). The LB-1 load balancer, which tends to haveshort, frequent failures and accounts for most failures (but relativelylow downtime), mainly experiences software problems. Hardwareproblems dominate for the remaining device types. We observe thatLB-3, despite also being a load balancer, sees much fewer softwareissues than LB-1 and LB-2 devices, suggesting higher stability inthe newer model of the device.

Link failures are dominated by connection and hardware prob-lems. Figure 12 shows the total number of failures and total down-time attributed to different causes for link failures. In contrast todevice failures, link failures are dominated by network connectionerrors, followed by hardware and software issues. In terms of down-time, software errors incur much less downtime per failure thanhardware and network connection problems. This suggests soft-ware problems lead to sporadic short-lived failures (e.g., a softwarebug causing a spurious link down notification) as opposed to severenetwork connectivity and hardware related problems.

5. ESTIMATING FAILURE IMPACTIn this section, we estimate the impact of link failures. In the

absence of application performance data, we aim to quantify theimpact of failures in terms of lost network traffic. In particular, weestimate the amount of traffic that would have been routed on afailed link had it been available for the duration of the failure.

In general, it is difficult to precisely quantify how much datawas actually lost during a failure because of two complications.First, flows may successfully be re-routed to use alternate routes af-ter a link failure and protocols (e.g., TCP) have in-built retransmis-sion mechanisms. Second, for long-lived failures, traffic variations(e.g., traffic bursts, diurnal workloads) mean that the link may nothave carried the same amount of data even if it was active. There-fore, we propose a simple metric to approximate the magnitude oftraffic lost due to failures, based on the available data sources.

To estimate the impact of link failures on network traffic (bothin terms of bytes and packets), we first compute the median numberof packets (or bytes) on the link in the hours preceding the failureevent,medb, and the median packets (or bytes) during the failuremedd. We then compute the amount of data (in terms of packets orbytes) that waspotentially lost during the failure event as:

loss = (medb −medd)× duration

whereduration denotes how long the failure lasted. We use me-dian traffic instead of average to avoid outlier effects.

As described in Section 2, the network traffic in a typical datacenter may be classified into short-lived, latency-sensitive “mice”flows and long-lived, throughput-sensitive “elephant” flows. Packetloss is much more likely to adversely affect “mice” flows wherethe loss of an ACK may cause TCP to perform a timed out retrans-mission. In contrast, loss in traffic throughput is more critical for“elephant” flows.

Link failures incur loss of many packets, but relatively few bytes.For link failures, few bytes are estimated to be lost relative to thenumber of packets. We observe that the estimated median numberof packets lost during failures is 59K (Figure 13) but the estimatedmedian number of bytes lost is only 25MB (Figure 14). Thus, theaverage size of lost packets is 423 bytes. Prior measurement studyon data center network traffic observed that packet sizes tend tobe bimodal with modes around 200B and 1,400B [5]. This sug-gests that packets lost during failures are mostly part of the lowermode, consisting of keep alive packets used by applications (e.g.,MYSQL, HTTP) or ACKs [5].

5.1 Is redundancy effective in reducing impact?In a well-designed network, we expect most failures to be

masked by redundant groups of devices and links. We evaluatethis expectation by considering median traffic during a link fail-ure (in packets or bytes) normalized by median traffic before thefailure:medd/medb; for brevity, we refer to this quantity as “nor-malized traffic”. The effectiveness of redundancy is estimated bycomputing this ratio on a per-link basis, as well as across all linksin the redundancy group where the failure occurred. An exampleof a redundancy group is shown in Figure 15. If a failure has beenmasked completely, this ratio will be close to one across a redun-dancy group i.e., traffic during failure was equal to traffic beforethe failure.

Network redundancy helps, but it is not entirely effective. Fig-ure 16 shows the distribution of normalized byte volumes for indi-vidual links and redundancy groups. Redundancy groups are effec-tive at moving the ratio of traffic carried during failures closer toone with 25% of events experiencing no impact on network trafficat the redundancy group level. Also, the median traffic carried atthe redundancy group level is 93% as compared with 65% per link.This is an improvement of 43% in median traffic as a result of net-work redundancy. We make a similar observation when consideringpacket volumes (not shown) .

There are several reasons why redundancy may not be 100%effective in eliminating the impact of failures on network traffic.First, bugs in fail-over mechanisms can arise if there is uncertaintyas to which link or component is the back up (e.g., traffic may beregularly traversing the back up link [7]). Second, if the redundantcomponents are not configured correctly, they will not be able to re-route traffic away from the failed component. For example, we ob-served the same configuration error made on both the primary andback up of a network connection because of a typo in the configura-tion script.Further, protocol issues such as TCP backoff, timeouts,and spanning tree reconfigurations may result in loss of traffic.

5.2 Redundancy at different layers of the net-work topology

This section analyzes the effectiveness of network redundancyacross different layers in the network topology. We logically di-vide links based on their location in the topology. Location is de-termined based on the types of devices connected by the link (e.g.,a CoreCore link connects two core routers). Figure 17 plots quar-tiles of normalized traffic (in bytes) for links at different layers ofthe network topology.

��

��

��

��

Figure 15: An example redundancy group between a primary(P) and backup (B) aggregation switch (AggS) and accessrouter (AccR).

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0

Traffic during/traffic before

P[X

<x]

per linkper redundancy group

Figure 16: Normalized traffic (bytes) during failure events perlink as well as within redundancy groups.

Links highest in the topology benefit most from redundancy.A reliable network core is critical to traffic flow in data centers.We observe that redundancy is effective at ensuring that failuresbetween core devices have a minimal impact. In the core of the net-work, the median traffic carried during failure drops to 27% per linkbut remains at 100% when considered across a redundancy group.Links between aggregation switches and access routers (AggAccR)experience the next highest benefit from redundancy where the me-dian traffic carried per link during failure drops to 42% per link butremains at 86% across redundancy groups.

Links from ToRs to aggregation switches benefit the least fromredundancy, but have low failure impact. Links near the edge ofthe data center topology benefit the least from redundancy, wherethe median traffic carried during failure increases from 68% onlinks to 94% within redundancy groups for links connecting ToRsto AggS. However, we observe that on a per link basis, these linksdo not experience significant impact from failures so there is lessroom for redundancy to benefit them.

6. DISCUSSIONIn this section, we discuss implications of our study for the de-

sign of data center networks and future directions on characterizingdata center reliability.

Low-end switches exhibit high reliability. Low-cost, commod-ity switches in our data centers experiencethe lowest failure ratewith a failure probability of less than 5% annually for all types ofToR switches and AggS-2.However, due to their much larger pop-ulation, the ToRs still rank third in terms of number of failures and

All ToRAgg AggAccR AggLB AccRCore CoreCore

Topology level

Med

ian

norm

aliz

ed #

byte

s

0.0

0.2

0.4

0.6

0.8

1.0

− −

−−

− −

− − −

− − −− −− − −

−

− − − − − −per linkper redundancy group

Figure 17: Normalized bytes (quartiles) during failure eventsper link and across redundancy group compared across differ-ent layers in the data center topology.

dominate in terms of total downtime. Since ToR failures are consid-ered the norm rather than the exception (and are typically maskedby redundancy in application, data, and network layers), ToRs havea low priority for repair relative to other outage types. This sug-gests that proposals to leverage commodity switches to build flatdata center networks [3, 12, 21] will be able to provide good reli-ability. However, as populations of these devices rise, the absolutenumber of failures observed will inevitably increase.

Improve reliability of middleboxes. Our analysis of network fail-ure events highlights the role that middle boxes such as load bal-ancers play in the overall reliability of the network. While therehave been many studies on improving performance and scalabilityof large-scale networks [2, 3, 12–14, 21], only few studies focus onmanagement of middle boxes in data center networks [15]. Mid-dle boxes such as load balancers are a critical part of data centernetworks that need to be taken into account when developing newrouting frameworks. Further, the development of better manage-ment and debugging tools would help alleviate software and con-figuration faults frequently experienced by load balancers. Finally,software load balancers running on commodity servers can be ex-plored to provide cost-effective, reliable alternatives to expensiveand proprietary hardware solutions.

Improve the effectiveness of network redundancy.We observethat network redundancies in our system are 40% effective at mask-ing the impact of network failures. One cause of this is due to con-figuration issues that lead to redundancy being ineffective at mask-ing failure. For instance, we observed an instance where the sametypo was made when configuring interfaces on both the primary andback up of a load balancer connection to an aggregation switch.As a result, the back up link was subject to the same flaw as theprimary. This type of error occurs when operators configure largenumbers of devices, and highlights the importance of automatedconfiguration and validation tools (e.g., [8]).

Separate control plane from data plane. Our analysis of NOCtickets reveals that in several cases, the loss of keep alive mes-sages resulted in disconnection of portchannels, which are virtuallinks that bundle multiple physical interfaces to increase aggre-gate link speed. For some of these cases, we manually correlatedloss of control packets with application-level logs that showed sig-nificant traffic bursts in the hosted application on the egress path.This interference between application and control traffic is undesir-able. Software Defined Networking (SDN) proposals such as Open-Flow [20] present a solution to this problem by maintaining state in

a logically centralized controller, thus eliminating keep alive mes-sages in the data plane. In the context of proposals that leveragelocation independent addressing (e.g., [12, 21]), this separation be-tween control plane (e.g., ARP and DHCP requests, directory ser-vice lookups [12]) and data plane becomes even more crucial toavoid impact to hosted applications.

7. RELATED WORKPrevious studies of network failures have considered application-

level [16, 22] or network connectivity [18, 19, 25, 26, 28] failures.There also have been several studies on understanding hardwarereliability in the context of cloud computing [11, 23, 24, 27].

Application failures. Padmanabhanet al. consider failures fromthe perspective of Web clients [22]. They observe that the majorityof failures occur during the TCP handshake as a result of end-to-end connectivity issues. They also find that Web access failures aredominated by server-side issues. These findings highlight the im-portance of studying failures in data centers hosting Web services.

Netmedic aims to diagnose application failures in enterprisenetworks [16]. By taking into account state of components that failtogether (as opposed to grouping all components that fail together),it is able to limit the number of incorrect correlations between fail-ures and components.

Network failures. There have been many studies of network fail-ures in wide area and enterprise networks [18, 19, 25, 26, 28] butnone considernetwork elementfailures in large-scale data centers.

Shaikhet al study properties of OSPF Link State Advertise-ment (LSA) traffic in a data center connected to a corporate networkvia leased lines [25]. Watsonet al also study stability of OSPF byanalyzing LSA messages in a regional ISP network [28]. Both stud-ies observe significant instability and flapping as a result of externalrouting protocols (e.g., BGP). Unlike these studies, we do not ob-serve link flapping owing to our data sources being geared towardsactionable events.

Markopolouet al. use IS-IS listeners to characterize failures inan ISP backbone [19]. The authors classify failures as either routerrelated or optical relatedby correlating time and impacted networkcomponents. They find that 70% of their failures involve only asingle link. We similarly observe that the majority of failures in ourdata centers are isolated.

More recently, Turneret al. consider failures in an academicWAN using syslog messages generated by IS-IS [26]. Unlike pre-vious studies [19, 25, 28], the authors leverage existing syslog, e-mail notifications, and router configuration data to study networkfailures. Consistent with prior studies that focus on OSPF [25, 28],the authors observe link flapping. They also observe longer time torepair on wide area links, similar to our observations for wide arealinks connecting data centers.

Failures in cloud computing. The interest in cloud computinghas increased focus on understanding component failures, as evena small failure rate can manifest itself in a high number of fail-ures in large-scale systems. Previous work has looked at failuresof DRAM [24], storage [11, 23] and server nodes [27], but therehas not been a large-scale study on network component failures indata centers. Fordet al. consider the availability of distributed stor-age and observe that the majority of failures involving more thanten storage nodes are localized within a single rack [11]. We alsoobserve spatial correlations but they occur higher in the networktopology, where we see multiple ToRs associated with the sameaggregation switch having correlated failures.

Complementary to our work, Bensonet al. mine threads fromcustomer service forums of an IaaS cloud provider [6]. They report

on problems users face when using IaaS and observe that problemsrelated to managing virtual resources and debugging performanceof computing instances that require involvement of cloud adminis-trators, increase over time.

8. CONCLUSIONS AND FUTURE WORKIn this paper, we have presented the first large-scale analysis

of network failure events in data centers. We focused our analysison characterizing failures of network links and devices, estimatingtheir failure impact, and analyzing the effectiveness of network re-dundancy in masking failures. To undertake this analysis, we devel-oped a methodology that correlates network traffic logs with logs ofactionable events, to filter a large volume of non-impacting failuresdue to spurious notifications and errors in logging software.

Our study is part of a larger project, NetWiser, on understand-ing reliability in data centers to aid research efforts on improv-ing network availability and designing new network architectures.Based on our study, we find that commodity switches exhibit highreliability which supports current proposals to design flat networksusing commodity components [3, 12, 17, 21]. We also highlight theimportance of studies to better manage middle boxes such as loadbalancers, as they exhibit high failure rates. Finally, more investi-gation is needed to analyze and improve the effectiveness of redun-dancy at both network and application layers.

Future work. In this study, we consider occurrence of interfacelevel failures. This is only one aspect of reliability in data centernetworks. An important direction for future work is correlating logsfrom application-level monitors with the logs collected by networkoperators to determine what fraction of observed errorsdo not im-pact applications (false positives) and what fraction of applicationerrors are not observed (e.g., because of a server or storage failurethat we cannot observe). This would enable us to understand whatfraction of application failures can be attributed to network fail-ures. Another extension to our study would be to understand whatthese low level failures mean in terms of convergence for networkprotocols such as OSPF, and to analyze the impact on end-to-endnetwork connectivity by incorporating logging data from externalsources e.g., BGP neighbors.

AcknowledgementsWe thank our shepherd Arvind Krishnamurthy and the anonymousreviewers for their feedback. We are grateful to David St. Pierre forhelping us understand the network logging systems and data sets.

9. REFERENCES[1] Cisco: Data center: Load balancing data center services, 2004.

www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns668/net_implementation_white_paper0900aecd8053495a.html.

[2] H. Abu-Libdeh, P. Costa, A. I. T. Rowstron, G. O’Shea, andA. Donnelly. Symbiotic routing in future data centers. InSIGCOMM,2010.

[3] M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commoditydata center network architecture. InSIGCOMM, 2008.

[4] M. Alizadeh, A. Greenberg, D. Maltz, J. Padhye, P. Patel,B. Prabhakar, S. Sengupta, and M. Sridharan. Data Center TCP(DCTCP). InSIGCOMM, 2010.

[5] T. Benson, A. Akella, and D. Maltz. Network traffic characteristics ofdata centers in the wild. InIMC, 2010.

[6] T. Benson, S. Sahu, A. Akella, and A. Shaikh. A first look atproblems in the cloud. InHotCloud, 2010.

[7] J. Brodkin. Amazon EC2 outage calls ’availability zones’intoquestion, 2011.http://www.networkworld.com/news/2011/042111-amazon-ec2-zones.html.

[8] X. Chen, Y. Mao, Z. M. Mao, and K. van de Merwe. Declarativeconfiguration management for complex and dynamic networks. InCoNEXT, 2010.

[9] Cisco. UniDirectional Link Detection (UDLD).http://www.cisco.com/en/US/tech/tk866/tsd_technology_support_sub-protocol_home.html.

[10] Cisco. Spanning tree protocol root guard enhancement, 2011.http://www.cisco.com/en/US/tech/tk389/tk621/technologies_tech_note09186a00800ae96b.shtml.

[11] D. Ford, F. Labelle, F. Popovici, M. Stokely, V.-A. Truong,L. Barroso, C. Grimes, and S. Quinlan. Availability in globallydistributed storage systems. InOSDI, 2010.

[12] A. Greenberg, J. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri,D. Maltz, P. Patel, and S. Sengupta. VL2: A scalable and flexible datacenter network. InSIGCOMM, 2009.

[13] C. Guo, H. Wu, K. Tan, L. Shiy, Y. Zhang, and S. Lu. DCell: Ascalable and fault-tolerant network structure for data centers. InSIGCOMM, 2008.

[14] C. Guo, H. Wu, K. Tan, L. Shiy, Y. Zhang, and S. Lu. BCube: Ahighperformance, server-centric network architecture for modular datacenters. InSIGCOMM, 2009.

[15] D. Joseph, A. Tavakoli, and I. Stoica. A policy-aware switching layerfor data centers. InSIGCOMM, 2008.

[16] S. Kandula, R. Mahajan, P. Verkaik, S. Agarwal, J. Padhye, andP. Bahl. Detailed diagnosis in enterprise networks. InSIGCOMM,2010.

[17] C. Kim, M. Caesar, and J. Rexford. Floodless in SEATTLE: ascalable ethernet architecture for large enterprises. InSIGCOMM,2008.

[18] C. Labovitz and A. Ahuja. Experimental study of internetstabilityand wide-area backbone failures. InThe Twenty-Ninth AnnualInternational Symposium on Fault-Tolerant Computing, 1999.

[19] A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah,Y. Ganjali, and C. Diot. Characterization of failures in an operationalIP backbone network.IEEE/ACM Transactions on Networking, 2008.

[20] N. Mckeown, T. Anderson, H. Balakrishnan, G. Parulkar,L. Peterson, J. Rexford, S. Shenker, and J. Turner. Openflow:enabling innovation in campus networks. InSIGCOMM CCR, 2008.

[21] R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri,S. Radhakrishnan, V. Subramanya, and A. Vahdat. PortLand: Ascalable fault-tolerant layer 2 data center network fabric. InSIGCOMM, 2009.

[22] V. Padmanabhan, S. Ramabhadran, S. Agarwal, and J. Padhye. Astudy of end-to-end web access failures. InCoNEXT, 2006.

[23] B. Schroeder and G. Gibson. Disk failures in the real world: Whatdoes an MTTF of 1,000,000 hours mean too you? InFAST, 2007.

[24] B. Schroeder, E. Pinheiro, and W.-D. Weber. DRAM errorsin thewild: A large-scale field study. InSIGMETRICS, 2009.

[25] A. Shaikh, C. Isett, A. Greenberg, M. Roughan, and J. Gottlieb. Acase study of OSPF behavior in a large enterprise network. InACMIMW, 2002.

[26] D. Turner, K. Levchenko, A. C. Snoeren, and S. Savage. Californiafault lines: Understanding the causes and impact of network failures.In SIGCOMM, 2010.

[27] K. V. Vishwanath and N. Nagappan. Characterizing cloudcomputinghardware reliability. InSymposium on Cloud Computing (SOCC),2010.

[28] D. Watson, F. Jahanian, and C. Labovitz. Experiences withmonitoring OSPF on a regional service provider network. InICDCS,2003.

Dude, where’s that IP? Circumventing measurement-based IP geolocation

Phillipa Gill Yashar GanjaliDept. of Computer Science

University of Toronto

Bernard WongDept. of Computer Science

Cornell University

David LieDept. of Electrical and Computer Engineering

University of Toronto

Abstract

Many applications of IP geolocation can benefit from ge-olocation that is robust to adversarial clients. These in-clude applications that limit access to online content to aspecific geographic region and cloud computing, wheresome organizations must ensure their virtual machinesstay in an appropriate geographic region. This paperstudies the applicability of current IP geolocation tech-niques against an adversary who tries to subvert the tech-niques into returning a forged result. We propose andevaluate attacks on both delay-based IP geolocation tech-niques and more advanced topology-aware techniques.Against delay-based techniques, we find that the adver-sary has a clear trade-off between the accuracy and thedetectability of an attack. In contrast, we observe thatmore sophisticated topology-aware techniques actuallyfare worse against an adversary because they give theadversary more inputs to manipulate through their useof topology and delay information.

1 Introduction

Many applications benefit from using IP geolocation todetermine the geographic location of hosts on the In-ternet. For example, online advertisers and search en-gines tailor their content based on the client’s location.Currently, geolocation databases such as Quova [22] andMaxMind [16] are the most popular method used by ap-plications that need geolocation services.

Geolocation is also used in many security-sensitive ap-plications. Online content providers such as Hulu [13],BBC iPlayer [22], RealMedia [22] and Pandora [20],limit their content distribution to specific geographic re-gions. Before allowing a client to view the content, theydetermine the client’s location from its IP address and al-low access only if the client is in a permitted jurisdiction.In addition, Internet gambling websites must restrict ac-cess to their applications based on the client’s location

or risk legal repercussions [29]. Accordingly, these busi-nesses rely on geolocation to limit access to their onlineservices.

Looking forward, the growth of infrastructure-as-a-service clouds, such as Amazon’s EC2 service [1], mayalso drive organizations using cloud computing to em-ploy geolocation. Users of cloud computing deploy VMson a cloud provider’s infrastructure without having tomaintain the hardware their VM is running on. However,differences in laws governing issues such as privacy, in-formation discovery, compliance and audit require thatsome cloud users to restrict VM locations to certain juris-dictions or countries [6]. These location restrictions maybe specified as part of a service level agreement (SLA)between the cloud user and provider. Cloud users canuse IP geolocation to independently verify that the loca-tion restrictions in their cloud SLAs are met.

In these cases, the target of geolocation has an incen-tive to mislead the geolocation system about its true lo-cation. Clients commonly use proxies to mislead contentproviders so they can view content that is unauthorizedin their geographic region. In response, some contentproviders [13] however, have identified and blocked ac-cess from known proxies; but this does not prevent allclients from circumventing geographic controls. Sim-ilarly, cloud providers may attempt to break locationrestrictions in their SLAs to move customer VMs tocheaper locations. Governments that enforce location re-quirements on the cloud user may require the geoloca-tion checks to be robustno matter whata cloud providermay do to mislead them. Even if the cloud provider itselfis not malicious, its employees may also try to relocateVMs to locations where they can be attacked by othermalicious VMs [24]. Thus, while cloud users might trustthe cloud service provider, they may still be required tocd ..have independent verification of the location of theirVMs to meet audit requirements or to avoid legal liabil-ity.

IP geolocation has been an active field of research foralmost a decade. However, all current geolocation tech-niques assume a benign target that is not trying to in-tentionally mislead the user, and there has been limitedwork on geolocating malicious targets. Castellucciaetal. apply Constraint-Based Geolocation (CBG) [12] tothe problem of geolocating fast-flux hidden servers thatuse a layer of proxies in a botnet [5] to conceal their loca-tion. Muir and Oorschot [18] describe limitations of pas-sive geolocation techniques (e.g.,whois services) andpresent a technique for finding the IP address of a ma-chine using the Tor anonymization network [28]. Theseprevious works focus on de-anonymization of hosts be-hind proxies, while our contribution in this paper is toanswer fundamental questions about whether current ge-olocation algorithms are suitable for security-sensitiveapplications:

• Are current geolocation algorithms accurateenough to locate an IP within a certain countryor jurisdiction? We answer this question by sur-veying previously published studies of geolocationalgorithms. We find that current algorithms haveaccuracies of 35-194 km, making them suitable forgeolocation within a country.

• How can adversaries attack a geolocation sys-tem? We propose attacks on two broad classes ofmeasurement-based geolocation algorithms – thoserelying on network delay measurements and thoseusing network topology information. To evaluatethe practicality of these attacks, we categorize ad-versaries into two classes – a simple adversary thatcan manipulate network delays and a sophisticatedone with control over a set of routable IP addresses.

• How effective are such attacks? Can they bedetected? We evaluate our attacks by analyzingthem against models of geolocation algorithms. Wealso perform an empirical evaluation using mea-surements taken from PlanetLab [21] and execut-ing attacks on implementations of delay-based andtopology-aware geolocation algorithms. We ob-serve the simple adversary has limited accuracy andmust trade off accuracy for detectability of their at-tack. On the other hand, the sophisticated adversaryhas higher accuracy and remains difficult to detect.

The rest of this paper is structured as follows. Sec-tion 2 summarizes relevant background and previouswork on geolocation techniques. The security model andassumptions we use to evaluate current geolocation pro-posals is described in Section 3. We develop and ana-lyze attacks on delay-based and topology-aware geolo-cation methods in Sections 4 and 5, respectively. Sec-tion 6 presents related work that evaluates geolocation

when confronted by a target that leverages proxies. Wepresent conclusions in Section 7.

2 Geolocation Background

IP geolocation aims to solve the problem of determin-ing the geographic location of a given IP address. Thesolution can be expressed to varying degrees of granu-larity; for most applications the result should be preciseenough to determine the city in which the IP is located,either returning a city name or the longitude and latitudewhere the target is located. The two main approaches togeolocation use either active network measurements todetermine the location of the host or databases of IP tolocation mappings.

Measurement-based geolocation algorithms [9,12,14,19, 30, 31] leverage a set of geographically distributedlandmarkhosts with known locations to locate thetar-get IP. These landmarks measure various network prop-erties, such as delay, and the paths taken by traffic be-tween themselves and the target. These results are usedas input to the geolocation algorithm which uses themto determine the target’s location using methods such as:constraining the region where the target may be located(geolocalization) [12, 30], iterative force directed algo-rithms [31], machine learning [9] and constrained opti-mization [14].

Geolocation algorithms mainly rely onping [7] andtraceroute [7] measurements. Ping measures theround-trip time (RTT) delay between two machines onthe Internet, whiletraceroute discovers and mea-sures the RTT to routers along the path to a given des-tination. We classify measurement-based geolocation al-gorithms by the type of measurements they use to deter-mine the target’s location. We refer to algorithms thatuse end-to-end RTTs as delay-based [9,12,31] and thosethat use both RTT and topology information as topology-aware algorithms [14,30].

An alternative to measurement-based geolocation isgeolocation using databases of IP to location mappings.These databases can be either proprietary or public. Pub-lic databases include those administered by regional In-ternet registries (e.g., ARIN [3], RIPE [23]). Propri-etary databases of IP to geographic location mappingsare provided by companies such as Quova [22] and Max-mind [16]. While the exact method of constructing thesedatabases is not public, they are sometimes based on acombination ofwhois services, DNS LOC records andautonomous system (AS) numbers [2]. Registries anddatabases tend to be coarse grained, usually returning theheadquarters location of the organization that registeredthe IP address. This becomes a problem when organiza-tions distribute their IP addresses over a wide geographicregion, such as large ISPs or content providers. Mislead-

Table 1: Average accuracy of measurement-based geolocation algorithms.Class Algorithm Average accuracy (km)

Delay-basedGeoPing [19] 150 km (25th percentile); 109 km (median) [30]CBG [12] 78-182Statistical [31] 92Learning-based [9] 407-449 (113 km less than CBG [12] on their data)

Topology-awareTBG [14] 194Octant [30] 35-40 (median)

Other GeoTrack [19] 156 km (median) [30]

ing database geolocation is also straightforward throughthe use of proxies.

DNS LOC [8] is an open standard that allows DNS ad-ministrators to augment DNS servers with location infor-mation, effectively creating a publicly available databaseof IP location information. However, it has not gainedwidespread usage. In addition, since the contents of theDNS LOC database are not authenticated and are set bythe owners of the IP addresses themselves, it is poorlysuited for security-sensitive applications.

Much research has gone into improving the accuracyof measurement-based geolocation algorithms; conse-quently, they provide fairly reliable results. Table 1shows the reported average accuracies of recently pro-posed geolocation algorithms. Based on the reported ac-curacies, we believe that current geolocation algorithmsare sufficiently accurate to place a machine within acountry or jurisdiction. In particular, CBG [12] and Oc-tant [30] appear to offer accuracies well within the sizeof most countries and may even be able to place userswithin a metropolitan area. Measurement-based geoloca-tion is particularly appealing for secure geolocation be-cause if a measurement can reach the target (e.g., usingapplication layer measurements [17]), even if it is behinda proxy (e.g., SOCKS or HTTP proxy), the effectivenessof proxying will be diminished.

3 Security Model

We model secure geolocation as a three-party problem.First, there is the geolocationuser or victim. The userhopes to accurately determine the location of the targetusing a geolocation algorithm that relies on measure-ments of network properties1. We assume that; (1) theuser has access to a number of landmark machines dis-tributed around the globe to make measurements of RTTsand network paths, and (2) the user trusts the results ofmeasurements reported by landmarks. Second, there istheadversary, who owns the target’s IP address. The ad-versary would like to mislead the user into believing thatthe target is at aforged locationof the adversary’s choos-ing, when in reality the target is actually located at the

true location. The adversary is responsible for physicallyconnecting the target IP address to the Internet, whichallows them to insert additional machines or routers be-tween the target and the Internet. The third party is theInternetitself. While the Internet is impartial to both ad-versary and user, it introduces additive noise as a resultof queuing delays and circuitous routes. These propertiesintroduce some inherent inaccuracy and unpredictabilityinto the results of measurements on which geolocationalgorithms rely. In general, an adversary’s malicioustampering with network properties (such as adding de-lay), if done in small amounts, is difficult to distinguishfrom additive noise introduced by the Internet.

This work addresses two types of adversaries with dif-fering capabilities. We assume in both cases that the ad-versary is fully aware of the geolocation algorithm andknows both the IP addresses and locations of all land-marks used in the algorithm. The first,simple adver-sarycan tamper only with the RTT measurements takenby the landmarks. This can be done by selectively de-laying packets from landmarks to make the RTT appearlarger than it actually is. The simple adversary was cho-sen to resemble a home user running a program to selec-tively delay responses to measurements. The second,so-phisticated adversary, controls several IP addresses andcan use them to create fake routers and paths to the tar-get. Further, this adversary may have a wide area net-work (WAN) with several gateway routers and can influ-ence BGP routes to the target. The sophisticated adver-sary was chosen to model a cloud provider as the adver-sary. Many large online service providers already deployWANs [11], making this attack model feasible with lowadditional cost to the provider.

We make two assumptions in this work. First, whileaware of the geolocation algorithm being used, and thelocation and IP addresses of all landmarks, the adver-sary cannot compromise the landmarks or run code onthem. Thus, the only way the adversary can compromisethe integrity of network measurements is to modify theproperties of traffic traveling on network links directlyconnected to a machine under its control.

The second assumption is that network measurementsmade by landmarks actually reach the target. Otherwise,an adversary could trivially attack the geolocation systemby placing a proxy at the forged location that responds toall geolocation traffic and forwards all other traffic to thetrue location. To avoid this attack, the user can eithercombine the measurements with regular traffic or protectit using cryptography. For example, if the geolocationuser is a Web content provider, Muir and Oorschot [18]have shown that even an anonymization network such asTor [28] may be defeated using a Java applet embeddedin a Web page. Users who want to geolocate a VM in acompute cloud may require the cloud provider to supporttamper-proof VMs [10, 25] and embed a secret key inthe VM for authenticating end-to-end network measure-ments. In this case, the adversary would need to place acopy of the VM in the forged location to respond to mea-surements. Given that the adversary is trying to avoidplacing a VM in the forged location, it is not a practicalattack for a malicious cloud provider.

4 Delay-based geolocation

Delay-based geolocation algorithms use measurementsof end-to-end network delays to geolocate the target IP.To execute delay-based geolocation, the landmarks needto calibrate the relationship between geographic distanceand network delay. This is done by having each land-mark,Li, ping all other landmarks. Since the landmarkshave known geographic locations,Li can then derive afunction mapping geographic distance,gij , to networkdelay,dij , observed to each other landmarkLj wherei 6= j [12]. Each landmark performs this calibration anddevelops its own mapping of geographic distance to net-work delay. After calibrating its distance-to-delay func-tion, it then pings the target IP. Using the distance-to-delay function, the landmark can then transform the ob-served delay to the target into a predicted distance to thetarget. All landmarks perform this computation to trian-gulate the location of the target.

Delay-based geolocation operates under the implicitassumption that network delay is well correlated with ge-ographic distance. However, network delay is composedof queuing, processing, transmission and propagation de-lay [15]. Where only the propagation time of networktraffic is related to distance traveled, and the other com-ponents vary depending on network load, thus addingnoise to the measured delay. This assumption is also vio-lated when network traffic does not take a direct (“as thecrow flies”) path between hosts. These indirect paths arereferred to as “circuitous” routes [30].

There are many proposed methods for delay-based ge-olocation, including GeoPing [19], Statistical Geoloca-tion [31], Learning-based Geolocation [9] and CBG [12].

These algorithms differ in how they express the distance-to-delay function and how they triangulate the position ofthe target. GeoPing is based on the observation that hoststhat are geographically close to each other will have de-lay properties similar to the landmark nodes [19]. Sta-tistical Geolocation develops a joint probability densityfunction of distance to delay that is input into a force-directed algorithm used to geolocate the target [31]. Incontrast, Learning-based Geolocation utilizes a Naı̈veBayes framework to geolocate a target IP given a set ofmeasurements [9]. CBG has the highest reported accu-racy of the delay-based algorithms, with a mean error of78-182 km [12]. The remainder of this section thereforefocuses on CBG to model and evaluate how an adversarycan influence delay-based geolocation techniques.

CBG [12] establishes the distance-delay function, de-scribed above, by having the landmarks ping each otherto derive a set of points (gij ,dij) mapping geographicdistance to network delay. To mitigate the effects ofcongestion on network delays, multiple measurementsare made, and the 2.5-percentile of network delays areused by the landmarks to calibrate their distance-to-delaymapping. Each landmark then computes a linear (“bestline”) function that is closest to, but below, the set ofpoints. Distance between each landmark and the targetIP is inferred using the “best line” function. This givesan implied circle around each landmark where the tar-get IP may be located. The target IP is then predicted tobe in the region of intersection of the circles of all thelandmarks. Since the result of this process is afeasibleregionwhere the target may be located, CBG determinesthe centroid of the region and returns this value as thegeolocation result. Gueyeet al. observe a mean errorof 182 km in the US and 78 km in Europe. They alsofind that the feasible region where the target IP may belocated ranges from104 km2 in Europe to105 km2 inNorth America.

4.1 Attack on delay-based geolocation

Since delay-based geolocation techniques do not takenetwork topology into account, the ability of a sophis-ticated adversary to manipulate network paths is of noadditional value. Against a delay-based geolocation al-gorithm, the simple and sophisticated adversaries haveequal power.

To mislead delay-based geolocation, the adversary canmanipulate distance of the target computed by the land-marks by altering the delay observed by each landmark.The adversary knows the identities and locations of eachlandmark and can thus identify traffic from the land-marks and alter the delay as necessary. To make the tar-get at the true location,t, appear to be at forged location,τ , the adversary must alter the perceived delay,dit, be-

Figure 1: Landmarks (PlanetLab nodes) used in evalua-tion.

Figure 2: Forged locations (τ ) used in the evaluation.

tween each landmark,Li andt to become the delay,diτ ,each landmark should perceive betweenLi andτ . To dothis, two problems must be solved. The adversary mustfirst find the appropriate delay,diτ , for each landmarkand then change the perceived delay to the appropriatedelay.

If the adversary controls a machine at or nearτ , shemay directly acquire the appropriatediτ for each land-mark by pinging each of the landmarks from the forgedlocation τ . However, pings to all the landmarks froma machine not related to the geolocation algorithm mayarouse suspicion. Also, it may not be the case that theadversary controls a machine at or nearτ .

Alternatively, with knowledge of the location of thelandmarks, the adversary can compute the geographicdistancesgit andgiτ between each landmarkLi and thetrue locationt as well as the forged locationτ . This en-ables the adversary to determine the additional distancea probe fromLi would travel (γi = giτ − git) had it ac-tually been directed to the forged locationτ . The nextchallenge is to mapγi into the appropriate amount of de-lay to add. To do this, the adversary may use 2/3 thespeed of light in a vacuum (c) as a lower-bound approxi-mation for the speed of traffic on the Internet [14]. Thus,the required delay to add to each ping fromLi is:

δi =2 × γi

2/3 × c(1)

The additional distance the ping fromLi would travel ismultiplied by2 because the delay measured byping isthe round-trip time as opposed to the end-to-end delay.This approximation is the lower bound on the delay thatwould be required for the ping to traverse the distance2×γi because the speed of light propagation is the fastestdata can travel between the two points.

Armed with this approximation of the appropriatediτ

for each landmark, the adversary can now increase thedelay of each probe from the landmarks. The perceiveddelay cannot be decreased since this would require the

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2000 4000 6000 8000 10000 12000

P[X

<x]

distance of attempted move (km)

within N. America outside N. America

Figure 3: CDF of the distance the adversary tries to movethe target.

adversary to either increase the speed of the network pathbetweent andLi, or slow down probes fromLi duringits calibration phase. Since the adversary cannot compro-mise the landmarks and does not control network pathsthat are not directly connected to one of her machines,she is not able to accomplish this. As a result, the adver-sary may only modify landmark delays that need to beincreased (i.e.,diτ > dit). For all other landmarks, shedoes not alter the delays. Thus, even with perfect knowl-edge of the delaysdiτ , neither a simple nor sophisticatedadversary will be able to execute an attack perfectly ondelay-based geolocation techniques.

4.2 Evaluation

We evaluate the effectiveness of our proposed attackagainst a simulator that runs the CBG algorithm pro-posed by Gueyeet al. [12]. We collected measurementinputs for the algorithm using 50 PlanetLab nodes. Eachnode takes a turn being the target with the remaining49 PlanetLab nodes being used as landmarks. Figure 1shows the locations of the PlanetLab nodes. Each tar-

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

P[X

<x]

error for attacker (km)

best-lineSOL delay

best-line (outside N. America)SOL delay (outside N. America)

Figure 4: CDF of error distance for the adversary whenattacking delay-based geolocation using speed of light(SOL) or best line delay.

get is initially geolocated using observed network delays.The target is then moved to 50 forged locations using thedelay-adding attack, shown in Figure 2. We select 40 ofthe forged locations based on the location of US univer-sities and 10 based on the location of universities outsideof North America. This results in a total of 2,500 at-tempted attacks on the CBG algorithm.

In the delay adding attack, the adversary cannot movea target that is not within the same region as the land-marks into that region. For example, if the target is lo-cated in Europe, moving it to a forged location in NorthAmerica would require reducing delay to all landmarks,which is not possible. This implies that if a geolocationprovider wants to prevent the adversary from moving thetarget into a specific region, it should place their land-marks in this desired region.

Figure 3 shows the CDF of the distances the adversaryattempts to move the target. In North America, the tar-get is moved less than 4,000 km most of the time movedmoved less than 1,379 km 50% of the time. Outside ofNorth America, the distance moved consistently exceeds5,000 km.

We evaluate the delay-adding attack under two cir-cumstances: (1) when the adversary knows exactly whatdelay to add (by giving the adversary access to the “bestline” function used by the landmarks), and (2) when theadversary uses the speed of light (SOL) approximationfor the additional delay.

4.2.1 Attack effectiveness

Since the adversary is only able to increase, and not de-crease, perceived delays, there are errors between theforged location,τ , and the actual location,r, returnedby the geolocation algorithm. To understand why theseerrors exist, consider Figure 5. The arcs labeledg1, g2,

g1

g2

g2’

g3g3’

t r

!"

Figure 5: Attacking delay-based geolocation.

andg3 are the circles drawn by 3 landmarks when ge-olocating the target. The region enclosed by the arcs isthe feasible region, and the geolocation result is the cen-troid of that region. To movet to τ , the adversary shouldincrease the radii ofg2 andg3 and decrease the radiusof g1. However, as described earlier, delay can only beadded, meaning that the adversary can only increase theradii ofg2 andg3 tog′

2andg′

3, respectively (shown by the

dotted lines). Since the delay ofg1 cannot be decreased,this results in a larger feasible region with a centroidrthat does not quite reachτ . We call the difference be-tween the geolocation result (r) and forged location (τ )the error distance (ε) for the adversary. The differencebetween the intended and actual direction of the move isthe angleθ.

We begin by evaluating the error distance,ε. Figure 4shows the CDF of error for the adversary over the set ofattempted attacks in our evaluation. Within North Amer-ica, an adversary using the speed of light approximationhas a median error of 1,143 km. When the adversary hasaccess to the best line function,their error decreases to671 km. As a reference, 671 km is approximately half thewidth of Texas. This indicates that when moving withinNorth America, it is possible for an adversary with ac-cess to the best line function to be successful in tryingto move the target into a specific state. We note thatthree of the targets used in our evaluation were locatedin Canada. Using the speed of light approximation theseCanadian targets are able to appear in the US 65% of thetime. Using the best line function, they are able to moveinto the US 89% of the time.

Outside of North America, the delay-adding attack haspoor accuracy with a minimum error for the adversary of4,947 km. As a reference, the distance from San Fran-cisco to New York City is 4,135 km. Error of this magni-tude is not practical for an adversary attempting to placethe target in a specific country. For the remainder of thissection, we focus on attacks where the adversary tries

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 1000 2000 3000 4000 5000 6000

erro

r fo

r at

tack

er (

km)


0.70*x90-percentile

median10-percentile

Figure 6: Error observed by the adversary depending ondistance of their attempted move for the delay-adding at-tack.

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

0 1000 2000 3000 4000 5000 6000

erro

r fo

r at

tack

er (

km)


0.40*x90-percentile

median10-percentile

Figure 7: Error observed by the adversary depending ondistance of their attempted move for the delay-adding at-tack when they have access to the best line function.

to move within North America because the error for theadversary is more reasonable.

We next consider how the distance the adversary triesto move the target affects the observed error. Figure 6shows error for the adversary depending on how far theadversary attempts to move the target when using thespeed of light approximation. Figure 7 shows the samedata for an adversary with access to the best line func-tion. We note that the error observed by the adversarygrows with the magnitude of the attempted move by theadversary. Specifically, for each 1 km the adversary triesto move the median error increases by 700 meters whenshe does not have access to the best line function. Withaccess to the best line function, the median error per kmdecreases by 43% to 400 km. Thus, the attack we pro-pose works best when the distance betweent andτ isrelatively small and the error observed by the attackergrows linearly with the size of the move.

Given the relatively high errors observed by the adver-sary, we next verify whether the adversary moves in herchosen direction. Figure 8 shows the CDF ofθ, the dif-ference between the direction the adversary tried to moveand the direction the target was actually moved. Whilelacking high accuracy when executing the delay-addingattack, the adversary is able to move the target in the gen-eral direction of her choosing. The difference in directionis less than 45 degrees 74% of the time and less than 90degrees 89% of the time. The attack where the adversaryhas access to the best line function performs better witha difference in direction of less than 45 degrees 91% ofthe time.

4.2.2 Attack detectability

We next look at whether a geolocation provider can de-tect the delay-adding attack and thus determine that thegeolocation result has been tampered with.

When CBG geolocates a target, it determines a feasi-ble region where the target can be located [12]. The sizeof the feasible region can be interpreted as a measure ofconfidence in the geolocation result. A very large regionsize indicates that there is a large area where the targetmay be located, although the algorithm returns the cen-troid. As we saw in Figure 5, the adversary, able onlyto add delay, can only increase the radii of the arcs andthus only increase the region size. As a result, the delay-adding attack always increases the feasible region sizeand reduces confidence in the result of the geolocation al-gorithm. We consider the region size computed by CBGbefore and after our proposed attack to determine howeffective region size may be for detecting an attack.

Figure 9 shows the region size for CBG when thedelay-adding attack is executed in general, when theattack only attempts to move the landmark less than1,000 km, and where the adversary has access to the bestline function. We observe that the region size becomesorders of magnitude larger when the delay-adding attackis executed. The region size grows even larger when theadversary uses the best line function. An adversary thatmoves the target less than 1,000 km is able to executethe attack without having much impact on the region sizedistribution.

The region size grows in proportion to the amount ofdelay added. This explains why the adversary createsa larger region size when using the best line function,which adds more delay than the speed of light approxi-

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100 120 140 160 180

P[X

<x]

absolute difference in direction (degrees)

best line delaySOL delay

Figure 8: CDF of change in direction for the delay-addingattack.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2x107 4x107 6x107 8x107

P[X

<x]

localization region size (km2)

CBGSOL delay (<=1000 KM move)

SOL delaybest line delay

Figure 9: CDF of region size for CBG before and after thedelay-adding attack.

104

105

106

107

108

0 1000 2000 3000 4000 5000 6000

regi

on s

ize

(km

2 )


90-percentilemedian

10-percentile

Figure 10: Region size depending on how far the adver-sary attempts to move the target using the best line func-tion.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 200000 400000 600000 800000 1000000

P[X

<x]


CBGSOL delay (<=1000 KM move)

SOL delaybest line delay

Figure 11: CDF of region size for CBG before and afterdelay-adding, limited to points less than 1,000,000 km2.

mation. Figure 10 illustrates this case. As the adversaryattempts to move the target further from its true location,the amount of delay that must be added increases. Thisin turn increases the region size returned by CBG. Thus,while there may be methods for adding delay that im-prove the adversary’s accuracy, they will only increasethe ability of the geolocation provider to detect the at-tack.

Given the increased region sizes observed when thedelay-adding attack is executed, one defense would be touse a region size threshold to exclude geolocation resultswith insufficient confidence. Increased region sizes maybe caused by an adversary adding delays, as we have ob-served or by fluctuations in the stochastic component ofnetwork delay. In either case, the geolocation algorithmobserves a region that is too large for practical purposes.

Suppose we discard all geolocation results with a regionsize greater than 1,000,000 km2 (this is approximatelythe size of Texas and California combined). Figure 11shows the CDF of region size below this threshold. Theadversary using the speed-of-light approximation will beundetected only 36% of the time. However, if the adver-sary attempts to move less than 1,000 km she will remainundetected 74% of the time. An adversary with accessto the best line for each of the landmarks is more eas-ily detectable because of the larger region sizes that re-sult from the larger injected delays. With a threshold of1,000,000 km2, the adversary using the best line functionwill have her results discarded 83% of the time. Thus,using a threshold on the region size is effective for de-tecting attacks on delay-based geolocation except whenthe attacker tries to move the target only a short distance.

5 Topology-aware geolocation

Delay-based geolocation relies on correlating measureddelays with distances between landmarks. As we sawpreviously, these correlations or mappings are appliedto landmark-to-target delays to create overlapping con-fidence regions; the overlap is the feasible region, andthe estimated location of the target is its centroid. Wheninter-landmark delays and landmark-to-target delays arenot similarly correlated with physical distances (e.g., dueto circuitous end-to-end paths) the resulting delay-to-distance relationships to the target can deviate signifi-cantly from the pre-computed correlations.

Topology-aware geolocation addresses this problemby limiting the impact of circuitous end-to-end paths;specifically, it localizes all intermediate routers in ad-dition to the target node, which results in a better es-timate of delays. Starting from the landmarks, the ge-olocation algorithm iteratively estimates the location ofall intermediate routers on the path between the land-mark and the target. This is done solely based onsingle-hop link delays, which are usually significantlyless circuitous than multi-hop end-to-end paths, enablingtopology-aware geolocation to be more resilient to cir-cuitous network paths than delay-based geolocation.

There are two previously proposed topology-awaregeolocation methods, topology-based geolocation(TBG) [14] and Octant [30]. These methods differin how they geolocate the intermediate routers. TBGuses delays measured between intermediate routersas inputs to a constrained optimization that solvesfor the location of the intermediate routers and targetIP [14]. In contrast, Octant leverages a “geolocalization”framework similar to CBG [12], where the location ofthe intermediate routers and target are constrained tospecific regions based on their delays from landmarksand other intermediate routers [30]. These delays aremapped into distances using a convex hull rather than alinear function, such as the best line in CBG to improvethe mapping between distance and delay.

Octant leverages several optimizations that improve itsperformance over other geolocation algorithms. Theseinclude: taking into account both positive and negativeconstraints; accounting for fixed delays along networkpaths, and decreasing the weight of constraints basedon latency measurements. Wonget al. find that theirscheme outperforms CBG, with median accuracies of 35-40 km [30]. In addition, the feasible regions returned byOctant are much smaller than those returned by CBG.They also observe that their scheme is robust even givena small number of landmarks with performance levelingoff after 15 landmarks.

When analyzing and evaluating attacks on topology-aware geolocation, we consider a generic geolocation

framework. Intermediate routers are localized using con-straints generated from latencies to adjacent routers. Thetarget is localized to a feasibility region generated basedon latencies from the last hop(s) before the target, andthe centroid of the region is returned.

5.1 Delay-based attacks on topology-awaregeolocation

Topology-aware geolocation systems localize all inter-mediate routers in addition to the target node. We beginby analyzing how a simple adversary, one without theability to fabricate routers, could attack the geolocationsystem, and then move onto how a sophisticated adver-sary could apply additional capabilities to improve theattack. Since the simple adversary has no control overthe probes outside her own network, any change madecan only be reflected on the final links of the path to-wards the target.

Most networks are usually connected to the rest of theInternet via a small number of gateway routers. Any pathconnecting nodes outside the adversary’s network to thetarget (which is inside the network) will go through oneof these routers. Here, we start with a simple case whereall routes towards the target converge on a single gate-way router; we then consider the more general case ofmultiple gateway routers.

CLAIM : 1 If the network paths from the landmarks tothe target converge to a single common gateway router,increasing the end-to-end delays between the landmarksand the target can be detected and mitigated by topology-aware geolocation systems.

To verify this claim, we first characterize the effectof delay-based attacks on topology-aware geolocation.Delay-based attacks selectively increase the delay of theprobes from landmarks. The probe from landmarkLi

is delayed for an additionalδi seconds. Given that allnetwork paths to the target converge to a single commongateway routerh, the end-to-end delay from each land-mark,Li, to the target can be written as:

dit = dih + dht + δi (2)

The observed latency from the gateway to the target isdit − dih, which is the sum of the real last-hop latencyand the attack delay. However, since the delay-based at-tack relies on selectively varying the attack delays,δi,based on the location ofLi, the observed last-hop latencybetween the gateway and the target will be inconsistentacross measurements initiated from different landmarks.

The high-variance in the last-hop link delay can beused to detect delay-based attacks in topology-aware ge-olocation systems. The attack can be mitigated by taking

the minimum observed delay for each link. The resultingobserved link delay fromh to the target is:

d̂ht = dht + minLi∈L

δi (3)

This significantly reduces the scope of delay-based at-tacks, requiring attack delays to be uniform across allmeasurement vantage points when there is only a singlecommon gateway to the target.

In general, if there are multiple gateway routers on theborder of the adversary’s network, we can make the fol-lowing weaker claim:

CLAIM : 2 Increasing the delay between each gate-way and the target can only be as effective againsttopology-based geolocation as increasing end-to-end de-lays against delay-based geolocation with a reduced setof landmarks.

An adversary could attempt to modify delays betweeneach gateway router,hj , and the target,t. This assumesthe adversary knows the approximate geolocation resultsfor all gateway routers2. Where there is only a singlegateway router with no additional attack delay, topology-based geolocation places the target within a circle cen-tered ath with coordinates(λh, φh):

√

(x − λh)2 + (y − φh)2 = dht (4)

Subjecting the latency measurement to an additional de-lay, δ, changes the equation to the following:

√

(x − λh)2 + (y − φh)2 = dht + δ (5)

Thus, for targets with a single gateway router, an adver-sary can only increase the localization region by intro-ducing an additional delay without changing the locationof the region’s geometric center.

For targets with multiple gateway routersH =h0, h1, ..., hn, targets are geolocated based on the de-lays between the gateways andt. An adversary canadd additional delay,δj , between each gateway,hj , andt based on the location ofhj . This is equivalent tothe delay-adding attack, except the previously geolo-cated gateway routers are used in place of the real land-marks. Therefore, the previous evaluation results for thedelay-adding attack on delay-based geolocation can beextended to topology-based geolocation for targets withmultiple gateway routers.

5.2 Topology-based attacks

In topology-based geolocation, intermediate nodes arelocalized to confidence regions, and geographic con-straints constructed from these intermediate nodes areexpanded by their confidence regions to account for the

accumulation of error. However, this does not result ina monotonic increase in the region size of intermediatenodes with each hop. The intersection of several ex-panded constraints for intermediate nodes along multiplenetwork paths to the target can still result in intermedi-ate nodes that are localized to small regions. A sophisti-cated adversary with control over a large administrativedomain can exploit this property by fabricating nodes,links and latencies within its network to create constraintintersections at specific locations. This assumes that theadversary can detect probe traffic issued from geoloca-tion systems in order to present a topologically differentnetwork without affecting normal traffic.

Externally visible nodes in an adversary’s networkconsist of gateway routersER = {er0, er1, ..., erm},internal routersF = {f0, f1, ..., fn} and end-pointsT = {τ0, τ1, ..., τs}. Internal routers can be fictitious,and network links between internal routers can be arbi-trarily manufactured. The adversary’s network can be de-scribed as the graphG = (V, E), whereV = F∪ER∪Trepresents routers, andE = {e0, e1, ..., ek} with weightsw(ei) is the set of links connecting the routers withweights representing network delays.

All internal link latencies, including those betweengateways, can be fabricated by the adversary. How-ever, the delay between fictitious nodes must respect thespeed-of-light constraint, which dictates that a packet canonly travel a distance equal to the product of delay andthe speed-of-light in fiber.

CLAIM : 3 Topology-based attacks require the adversaryto have more than one geographically distributed gate-way router to its network.

This claim follows from the analysis of delay-based at-tacks when all network paths to the target converge to acommon gateway router. With only one gateway routerto the network, changes to internal network nodes can af-fect only the final size of the localization region, not theregion’s geometric center.

CLAIM : 4 An adversary with control over three or moregeographically distributed gateway routers to its networkcan move the target to an arbitrary location.

Unlike delay-based attacks that can only increase laten-cies from the landmarks to the target, topology-basedattacks can assign arbitrary latencies from the ingresspoints to the target. From geometric triangulation, thisenables topology-based attacks to, theoretically, triangu-late the location of the target to any point on the globegiven three or more ingress points.

In practice, there are challenges that limit the adver-sary from achieving perfect accuracy with this attack.Specifically, the attack requires the adversary to know the

estimated location of the gateway routers and to have anaccurate model of the delay-to-distance function used bythe geolocation system. Such information can be reverse-engineered by a determined adversary by analyzing thegeolocation results of other targets in the adversary’s net-work.

Although a resourceful adversary’s topology-based at-tack can substantially affect geolocation results, it canalso introduce additionalcircuitousnessto all networkpaths to the target that creates a detectable signature. Cir-cuitousness refers to the ratio of actual distance traveledalong a network path to the direct distance between thetwo end points of a path. Circuitousness can be observedby plotting the location of intermediate nodes as they arelocated by the topology-aware geolocation system.

5.2.1 Naming attack extension

State-of-the-art, topology-based geolocation sys-tems [14, 30] leverage the structured way in which mostrouters are named to extract more precise informationabout router location. A collection of common namingpatterns is available through theundnstool [27], whichcan extract approximate city locations from the domainnames of routers.

When geolocation relies onundns, an adversary caneffectively change the observed location of the targeteven with only a single gateway router to its network.This naming attack requires the adversary is capable ofcrafting a domain name that can deceive theundnstool,poisoning theundnsdatabase with erroneous mappingsor responding to traceroutes with a spoofed IP address.The adversary only needs to use the naming attack toplace any last hops before the target at its desired geo-graphic location. The target will then be localized to thesame location as this last hop in the absence of sufficientconstraints.

Naming attacks exhibit the same increased circuitous-ness as standard topology-based attacks. Extensive poi-soning of theundnsdatabase could allow an attacker tochange the location of other routers along the networkpaths to reduce path circuitousness.

5.3 Evaluation

We evaluate the topology-based (hop-adding) attack andundnsnaming extension using a simulator of topology-aware geolocation. To perform the evaluation, we de-veloped the fictitious network illustrated in Figure 12.The network includes 4 gateway routers (ER), repre-sented by PlanetLab nodes in Victoria, BC; Riverside,CA; Ithaca, NY, and Gainesville, FL. The network alsoincludes 11 forged locations (T ) and 14 non-existent in-ternal routers (F ). Three of the non-existent routers are

Figure 12: The adversary’s network used for evaluatingthe topology-based attack.

geographically distributed around the US, while the other11 are placed close to the forged locations to improvethe effectiveness of the attack, especially when the ad-versary can manipulateundnsentries. Routers in the fic-titious network are connected using basic heuristics. Forexample, each of the 11 internal routers near the forgedlocations is connected to the 3 routers nearest them toaid in triangulation. We show that even using this simplenetwork design, an adversary executing the hop-addingattack andundnsextension can be successful.

To evaluate the attack, we use the same set of 50 Plan-etLab nodes used in evaluating the delay-adding attack(Figure 1), with an additional 30 European PlanetLabnodes that act only as targets attempting to move intoNorth America. We move the targets to the 11 forged lo-cations in the fictitious network. These locations, a sub-set of the 40 US locations used in evaluating the delay-adding attack, were chosen to be geographically dis-tributed around the US. Each of the 80 PlanetLab nodestakes a turn being the target with the remaining US Plan-etLab nodes used as landmarks. Each target is moved toeach of the 11 forged locations in turn, for a total of 880attacks.

When executing the attack, the traceroute from eachlandmark is directed to its nearest gateway router. Thefirst part of the traceroute is dictated by the networkpath between the landmark and its nearest gateway router(represented by a PlanetLab node). The second part isartificially generated to be the shortest path between thegateway router and the forged location. The latency ofthe second part is lower bounded by the speed-of-lightdelay between the gateway router and the target’s truelocation. When the speed-of-light latency between thegateway router and the target is greater than the latencyon the shortest path from the gateway to the forged lo-cation, the additional delay is divided across links in theshortest path.

0

200

400

600

800

1000

1200

1400

1600

1800

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

erro

r fo

r at

tack

er (

km)


90-percentilemedian

10-percentile

Figure 14: Error observed by the adversary dependingon how far they attempt to move the target using thetopology-based attack.

0

20

40

60

80

100

120

140

160

180

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

erro

r fo

r at

tack

er (

km)


90-percentilemedian

10-percentile

Figure 15: Error observed by the adversary depending onhow far they attempt to move the target using theundnsattack.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 200 400 600 800 1000 1200 1400 1600 1800 2000

P[X

<x]

error for attacker (km)

undns-attackundns-attack EU

hop-addinghop-adding EU

Figure 13: CDF of error distance for the attacker whenexecuting the topology-based andundnsattacks.

5.3.1 Attack effectiveness

We begin by examining how accurate the adversary canbe when attempting to move the target to a specificforged location. Figure 13 shows the error for the ad-versary when executing the topology-based attack andundnsextension. Without theundnsextension, the ad-versary is able to place a North American target within680 km of the false location 50% of the time. This is sim-ilar to the delay-adding attack in which the adversary hasaccess to the best line function. When moving a targetfrom Europe to North America, the adversary’s medianerror increases by 50% to 929 km. Despite this increase,we observe that the adversary succeeds in each attemptto move a European target into the US. In addition tothe overall decrease in accuracy for the adversary, wenote that there are some instances where the target in Eu-

rope misleads the algorithm with higher accuracy. Thisis caused by the adversary using the speed-of-light ap-proximation for latencies within their network. Since thespeed-of-light is the lower bound on network delay, whenadditional delay is added to the links to account for thetime it would take a probe to reach the target in Europe,the delay approaches the larger delay expected by thelandmarks’ distance-to-delay mapping. Theundnsex-tension increases the adversary’s accuracy by 93%, withthe adversary locating herself within 50 km of the forgedlocation 50% of the time. These results are consistentwhether the true location of the target is in North Amer-ica or Europe.

When analyzing the delay-adding attack, we observeda linear relationship between the distance the adversaryattempts to move the target and the error she observes.Figures 14 and 15 show the 10th percentile, median and90th percentile error for the attacker depending on howfar the forged location is from the target for the topology-based attack andundnsextension, respectively. The ob-served errors were quite erratic which is a result of themany other factors that affect the accuracy of geolocationbeyond the distance of the attempted move. In general,error for the adversary increases slowly as the adversarytries to move the target longer distances. This enables anadversary executing the topology-based attack to movethe target longer distances. Error for the adversary usingtheundnsextension remains fairly constant regardless ofhow far they attempt to move the target. In the case of theundnsattack, the median accuracy fluctuates by less than60 km whether the adversary moves 500 km or 4,000 km.The slow growth of adversary error stems from the en-gineered delays in the fictitious network. These delayscause nodes along the paths (including the end point) to

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140 160 180

P[X

<x]

absolute difference in direction (degrees)

undns-attackundns-attack (EU)

hop-adding (EU)hop-adding

Figure 16: CDF of change in direction for the topology-based andundnsextension.

be geolocated to a similar location regardless of wherethe target location was originally located.

We next confirm that the adversary is able to move inher chosen direction. Figure 16 shows the difference be-tween the direction the adversary tried to move the targetand the direction the target was actually moved (θ in thedelay-adding attack). For the general topology-based at-tack, the adversary is within 36 degrees of her intendeddirection 75% of the time and within 69 degrees 90% ofthe time. This improves with theundnsextension wherethe adversary is within 3 degrees of their intended direc-tion 95% of the time. When the target attempts to movefrom Europe to North America, they always move veryclose to their chosen direction. The adversary always iswithin 10 degrees of her chosen direction. The smallerchange in direction for European nodes stems from thelonger distance between the target and the forged loca-tion. This causes a smaller change in direction to be ob-served for similar error values compared to a target thatis closer to the forged location.

5.3.2 Attack detectability

We have observed that an adversary executing thetopology-based attack and theundnsextension to the at-tack can accurately relocate the geolocation target. Wenext consider whether the victim would be able to detectthese attacks and reduce their impacts on geolocation re-sults.

Figure 17 shows the region sizes for topology-awaregeolocation andundnsgeolocation before and after theattacks are executed (for both North America and Eu-ropean targets). Unlike the delay-adding attack, the ad-versary that adds hops to the traceroutes of the victimhas region sizes similar to the original algorithms and,in some cases, even smaller region sizes. For topology-

0

0.2

0.4

0.6

0.8

1

100 101 102 103 104 105 106 107

P[X

<x]


undns-attack (EU+NA)no attack (undns)

hop-adding (EU+NA)no attack

Figure 17: CDF of region size before and after thetopology-based attack andundnsextension.

aware geolocation, we observe median region sizes of102,273 km2 before and 50,441 km2 after the attack. Fortheundnsextension, we observe median region sizes of4,448 km2 before and 790 km2 after the attack. These re-sults indicate that region size is a poor metric for rulingout attacks that add hops to the end of traceroute paths.

Another metric that may be used to rule out geoloca-tion results that have been modified by an adversary ispathcircuitousness. We define circuitousness of a tracer-oute path between landmark,Li, and the target as fol-lows, wherer = (λr , φr) is the location returned by thegeolocation algorithm, andhj = (λj , φj) is the locationof intermediate hopj as computed by the geolocation al-gorithm:

C =dih0

+ Σnj=1

dhj−1hj+ dhnr

dir

(6)

Figure 18 shows the distribution of circuitousness forpaths between each landmark and the target for topology-aware geolocation before and after the topology-basedattack is executed3. We observe that when the topology-based attack is executed the circuitousness per landmarkincreases. One criterion a geolocation algorithm canuse for discarding results from the topology-based at-tack would be to discard results from landmarks wherethe circuitousness is abnormally high. If a geolocationframework that assigns weights to constraints, such asOctant, is used, constraints from landmarks with highcircuitousness could be given a lower weight to limit theadversary’s effectiveness. We note that a clever adver-sary could design her network to use more direct paths,making it more difficult to detect the attack by observingcircuitousness.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 5 25 125 625

P[X

<x]

circuitousness per landmark

no attackhop-adding (EU)

hop-adding

Figure 18: CDF of circuitousness for each landmark be-fore and after the topology-based attack.

6 Related work

While there have been many related works on developingand evaluating geolocation algorithms (e.g., [12, 14, 26,30]), there has been limited study of IP geolocation givena non-benign target [5,18].

Castelluccia et al. consider the application ofCBG [12] to the problem of geolocating hidden servershosting illegal content within a botnet [5]. The techniqueused to hide these servers is referred to as “fast-flux”,where a constantly changing set of machines infected bya botnet is used to proxy HTTP messages for a hiddenserver. Geolocating these servers is important to enablethe appropriate authorities to take action against them.Castellucciaet al. leverage the fact that the hidden serveris behind a layer of proxies to factor out the portion ofthe observed RTT caused by the proxy layer. They useHTTP connections to measure RTTs (because the hiddenservers are unlikely to respond toping ) and factor outadditional delay caused by the layer of proxies to geolo-cate hidden servers with a median error of 100 km usingPlanetLab nodes as ground truth hidden servers.

Muir and Oorschot survey a variety of geolocationtechniques and their applicability in the presence of anadversarial target [18]. Their work is similar to but dis-tinct from ours. Specifically, they emphasize geolocationtechniques that leverage secondary sources of informa-tion, such aswhois registries based on domain, IP andAS; DNS LOC [8]; application data from HTTP head-ers, and data inferred from routing information. Theyconsider delay-based geolocation but do not specify orevaluate any attacks on measurement-based geolocation.Muir and Oorschot discuss the limitations of IP geolo-cation when an adversary attempts to conceal her IP ad-dress through the use of an anonymization proxy and ex-amine how a Web page embedding a Java applet can dis-

cover a client’s true identity using Java’s socket class toconnect back to the server. They demonstrate this strat-egy for identifying clients using the Tor [28] anonymiza-tion network.

These previous works begin to consider the perfor-mance of geolocation algorithms when the target of ge-olocation may have incentive to be adversarial. However,they generally focus on the issue of geolocating hosts thatattempt to deceive geolocation using proxies. In con-trast, we develop and evaluate attacks on two classes ofmeasurement-based geolocation techniques by manipu-lating the network properties on which the techniquesrely.

We observe that the problem of geolocating an adver-sarial target is similar to the problem of secure position-ing [4] in the domain of wireless networks. Unlike wire-less signals, network delay is subject to additive noiseas a result of congestion and queuing along the networkpath as well as circuitous routes. Multiple hops alongnetwork paths on the Internet and the existence of largeorganizational WANs also enable new adversarial mod-els in the domain of IP geolocation.

7 Conclusions

Many applications of geolocation benefit from securityguarantees when confronted with an adversarial target.These include popular applications, such as limiting me-dia distribution to a specific region, fraud detection, andnewer applications, such as ensuring regional regulatorycompliance when using an infrastructure as a serviceprovider. This paper considered two models of an adver-sary trying to mislead measurement-based geolocationtechniques that leverage end-to-end delays and topologyinformation. To this end, we developed and evaluatedtwo attacks against delay-based and topology-aware ge-olocation.

To avoid detection, adversaries can leverage inherentvariability in network delay and circuitousness of net-work paths on the Internet to hide their tampering. Sincethese properties are measured and used by various geolo-cation techniques, they serve as good attack vectors bywhich the adversary can influence the geolocation result.

Our most surprising finding is that the more advancedand accurate topology-aware geolocation techniques aremore susceptible to covert tampering than the simplerdelay-based techniques. For geolocation algorithms thatleverage delay, we observed how a simple adversary thatonly adds delay to probes could alter the results of ge-olocation. However, this adversary has limited precisionwhen attempting to forge a specific location. We alsoobserved a clear trade-off between the amount of delayan adversary added and her detectability, using the re-

gion size returned by CBG [12] as a metric for discardinganomalous results.

Compared to delay-based geolocation, topology-aware geolocation fares no better against a simple adver-sary and worse against a sophisticated one. Topology-aware geolocation uses more information sources, suchas traceroute andundns, to achieve higher accuracy thandelay-based geolocation. Unfortunately, this advantagebecomes a weakness against an adversary able to corruptthese sources. A sophisticated adversary that can lever-age multiple network entry points (e.g., an infrastructureas a service provider) can cause the geolocation system toreturn a result as accurate as the best case simple adver-sary without increasing the resultant region size. Whenundnsentries are corrupted, the adversary is able to forgelocations with high accuracy without increasing the re-gion sizes – in some cases, even decreasing them.

Our work reveals limitations of current measurement-based geolocation techniques given an adversarial target.To provide secure geolocation, these algorithms must ac-count for the presence of untrustworthy measurements.This may be in the form of heuristics to discount mea-surements deemed untrustworthy or through the use ofsecure measurement protocols. We intend to explorethese directions in future work.

Acknowledgements

The authors would like to thank the anonymous review-ers and our shepherd, Steven Gribble, for their feed-back, which has helped to improve this paper. This workwas supported by the Natural Sciences and EngineeringResearch Council (NSERC) ISSNet and NSERC-CGSfunding.

References

[1] Amazon EC2, 2010.http://aws.amazon.com/ec2/.

[2] A NDERSON, M., BANSAL , A., DOCTOR, B., HADJIYIAN -NIC, G., HERRINGSHAW, C., KARPLUS, E., AND MUNIZ , D.Method and apparatus for estimating a geographic location of anetworked entity, June 2004. US Patent number: 6684250.

[3] American Registry for Internet numbers (ARIN), 2010.http://www.arin.net.

[4] CAPKUN, S., AND HUBAUX , J. Secure positioning of wirelessdevices with application to sensor networks. InProceedings ofIEEE INFOCOM Conference(March 2005).

[5] CASTELLUCCIA, C., KAAFAR , M., MANILS , P.,AND PERITO,D. Geolocalization of proxied services and its applicationto fast-flux hidden servers. InProceedings of the ACM SIGCOMM In-ternet Measurement Conference(November 2009).

[6] CBC. USA Patriot Act comes under fire in B.C. report, October2004.http://www.cbc.ca/canada/story/2004/10/29/patriotact_bc041029.html.

[7] CROVELLA , M., AND KRISHNAMURTHY, B. Internet Measure-ment: Infrastructure, Traffic and Applications. John Wiley &sons, 2006.

[8] DAVIS , C., VIXIE , P., GOODWIN, T., AND DICKINSON, I. Ameans for expressing location information in the domain namesystem. RFC 1876, IETF, Jan. 1996.

[9] ERIKSSON, B., BARFORD, P., SOMMERS, J.,AND NOWAK , R.A learning-based approach for IP geolocation. InProceedings ofthe Passive and Active Measurement Workshop(April 2010).

[10] GARFINKEL , T., PFAFF, B., CHOW, J., ROSENBLUM, M., AND

BONEH, D. Terra: A virtual machine-based platform for trustedcomputing. InProceedings of the 19th ACM Symposium on Op-erating Systems Principles (SOSP)(October 2003).

[11] GILL , P., ARLITT, M., L I , Z., AND MAHANTI , A. The flat-tening Internet topology: Natural evolution, unsightly barnaclesor contrived collapse? InProceedings of the Passive and ActiveMeasurement Workshop(April 2008).

[12] GUEYE, B., ZIVIANI , A., CROVELLA , M., AND FDIDA ,S. Constraint-based geolocation of Internet hosts.IEEE/ACMTransactions on Networking 14, 6 (December 2006).

[13] Hulu - watch your favorites. anytime. for free., 2010.http://www.hulu.com/.

[14] KATZ-BASSET, E., JOHN, J., KRISHNAMURTHY, A.,WETHERALL, D., ANDERSON, T., AND CHAWATHE , Y.Towards IP geolocation using delay and topology mesurements.In Proceedings of the ACM SIGCOMM Internet MeasurementConference(October 2006).

[15] KUROSE, J.,AND ROSS, K. Computer Networking: A top-downapproach featuring the Internet. Addison-Wesley, 2005.

[16] Maxmind - geolocation and online fraud prevention, 2010.http://www.maxmind.com.

[17] M.CASADO, AND FREEDMAN, M. Peering through the shroud:The effect of edge opacity on IP-based client identification. InProceedings of the 4th Symposium on Networked Systems Designand Implementation (NSDI)(Cambridge, MA, April 2007).

[18] MUIR, J., AND VAN OORSCHOT, P. Internet geolocation: Eva-sion and counterevasion.ACM Computing Surveys 42, 1 (Decem-ber 2009).

[19] PADMANABHAN , V., AND SUBRAMANIAN , L. An investigationof geographic mapping techniques for Internet hosts. InProceed-ings of ACM SIGCOMM(August 2001).

[20] Pandora Internet radio, 2010.http://www.pandora.com.

[21] Planetlab, 2010.http://www.planet-lab.org.

[22] Quova – IP geolocation experts, 2010.http://www.quova.com.

[23] Reseaux IP Europeens (RIPE), 2010.http://www.ripe.net.

[24] RISTENPART, T., TROMER, E., SHACHAM , H., AND SAVAGE ,S. Hey, you, get off my cloud! exploring information leakagein third-party compute clouds. InProceedings of the 16th ACMConference on Computer and Communications Security (CCS2009)(November 2009).

[25] SANTOS, N., GUMMADI , K. P.,AND RODRIGUES, R. Towardstrusted cloud computing. InProceedings of the 1st Workshop inHot Topics in Cloud Computing (HotCloud)(June 2009).

[26] SIWPERSAD, S., GUEYE, B., AND UHLIG , S. Assessing thegeographic resolution of exhaustive tabulation. InProceedings ofthe Passive and Active Measurement Workshop(April 2008).

[27] SPRING, N., MAHAJAN , R., AND WETHERALL, D. MeasuringISP topologies with Rocketfuel. InProceedings of ACM SIG-COMM (August 2002).

[28] THE TOR PROJECT. Tor: Overview, 2010. http://www.torproject.org/overview.html.en.

[29] TRANCREDI, P., AND MCCLUNG, K. Use case: Restrictaccess to online bettors, August 2009.http://www.quova.com/Uses/UseCaseDetail/09-08-31/Restrict_Access_to_Online_Bettors.aspx.

[30] WONG, B., STOYANOV, I., AND SIRER, E. G. Octant: A com-prehensive framework for the geolocalization of Internet hosts. InProceedings of the 4th Symposium on Networked Systems Designand Implementation (NSDI)(Cambridge, MA, April 2007).

[31] YOUNG, I., MARK , B., AND RICHARDS, D. Statistical geolo-cation of Internet hosts. InProceedings of the 18th InternationalConference on Computer Communications and Networks(Au-gust 2009).

Notes1In reality, the consumer of geolocation information will likely con-

tract out geolocation services from a third party geolocation providerthat will maintain landmarks. Given the common goals of these twoentities we model them as a single party.

2The adversary can assume that the gateway routers are geolocatedto their true locations.

3We make similar observations for theundnsattack extension.

phillipa gill - job application materialsphillipa/pgill_application.pdf · phillipa gill - job...

Documents