understanding and diagnosing routing dynamics in global ... · understanding and diagnosing routing...
TRANSCRIPT
UNIVERSITY OF CALIFORNIA
Los Angeles
Understanding and Diagnosing Routing Dynamics inGlobal Internet
A dissertation submitted in partial satisfaction
of the requirements for the degree
Doctor of Philosophy in Computer Science
by
Mohit Vijay Lad
2007
The dissertation of Mohit Vijay Lad is approved.
Mark Hansen
Dan Massey
Adam Meyerson
Songwu Lu
Rafail Ostrovsky
Lixia Zhang, Committee Chair
University of California, Los Angeles
2007
ii
To my Parents . . .
who gave meaning to my life and
among so many other things,
taught me to be patient
iii
TABLE OF CONTENTS
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Internet Routing and Border Gateway Protocol . . . . . . . . . . . . 1
1.2 BGP monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Problems and Scope of Thesis . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Inferring origin of routing changes . . . . . . . . . . . . . . . 4
1.3.2 Understanding routing changes in the Internet . . . . . . . . . 5
1.3.3 Security Problems . . . . . . . . . . . . . . . . . . . . . . . 6
2 Inferring Failures in Path Vector Routing . . . . . . . . . . . . . . . . . 9
2.1 Model and Problem Definition . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Network and Routing Model . . . . . . . . . . . . . . . . . . 11
2.1.2 Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Input Requirements . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Minimum e-set for tree inputs . . . . . . . . . . . . . . . . . . . . . 13
2.3 Minimum e-set for general graphs . . . . . . . . . . . . . . . . . . . 16
2.3.1 Problem Definition: Graph Version . . . . . . . . . . . . . . 17
2.3.2 The case of no purely transit nodes, VN = φ or V = VD . . . 18
2.3.3 The case of purely transit nodes, VN 6= φ . . . . . . . . . . . 21
3 Identifying Routing Problems in the Internet . . . . . . . . . . . . . . . 24
3.1 Challenges in diagnosing Internet routing problems . . . . . . . . . . 24
3.2 Capturing aggregate behavior: Notion of Link-weight . . . . . . . . . 25
iv
4 Visualizing Internet Routing Dynamics using Link-Rank . . . . . . . . 29
4.1 Components of Link-Rank . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Features of Link-Rank . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.1 Nodes, edges and color coding . . . . . . . . . . . . . . . . . 32
4.2.2 Activity plots: summarizing weight changes . . . . . . . . . . 33
4.2.3 Time Windows and Drilling Down . . . . . . . . . . . . . . . 34
4.2.4 Pruning Rank-change graphs . . . . . . . . . . . . . . . . . . 37
4.2.5 Assembled View: Merging Rank-change graphs from multiple
observation points . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Discovery and analysis using Link-Rank . . . . . . . . . . . . . . . . 38
4.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.2 Case I: Capturing Link Instabilities . . . . . . . . . . . . . . 40
4.3.3 Case II: Root-cause identification . . . . . . . . . . . . . . . 42
5 Inferring Origin of Internet Routing Problems . . . . . . . . . . . . . . 45
5.1 Characterizing links and identifying routing events . . . . . . . . . . 46
5.1.1 Link Events . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 The Inference Scheme . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2.1 Overview of approach . . . . . . . . . . . . . . . . . . . . . 52
5.2.2 Fault graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2.3 Augmenting fault graph with views from additional observa-
tion points . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2.4 Candidate set reduction . . . . . . . . . . . . . . . . . . . . . 57
5.2.5 Identifying node problems . . . . . . . . . . . . . . . . . . . 58
v
5.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3.1 Validation using Abilene Data . . . . . . . . . . . . . . . . . 59
5.3.2 Validation using Origin-adjacent events . . . . . . . . . . . . 60
5.3.3 Application to BGP data . . . . . . . . . . . . . . . . . . . . 63
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6 Detecting and Alerting about Prefix Hijacks . . . . . . . . . . . . . . . 68
6.1 Prefix Hijack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.3 Origin Change Detection . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3.1 Instantaneous Origin Changes . . . . . . . . . . . . . . . . . 77
6.3.2 Windowed Origin Changes . . . . . . . . . . . . . . . . . . . 79
6.3.3 Adaptive Window Size . . . . . . . . . . . . . . . . . . . . . 80
6.4 Notification Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.5 Local Notification Filter . . . . . . . . . . . . . . . . . . . . . . . . 85
6.5.1 Constructing filtering rules . . . . . . . . . . . . . . . . . . . 86
6.5.2 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.6.1 Notification Messages . . . . . . . . . . . . . . . . . . . . . 89
6.6.2 Detecting Known Events . . . . . . . . . . . . . . . . . . . . 91
6.6.3 Notification Delivery . . . . . . . . . . . . . . . . . . . . . . 92
6.7 Extensions to basic system . . . . . . . . . . . . . . . . . . . . . . . 94
6.7.1 Classification of Prefix hijack . . . . . . . . . . . . . . . . . 95
vi
6.7.2 Sub-prefix Set . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.7.3 Last Hop Set . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7 Understanding Resiliency of the Internet against Prefix Hijack . . . . . 98
7.1 Hijack Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 99
7.2 Evaluating Hijacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.2.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . 103
7.2.2 Characterizing Topological Resilience . . . . . . . . . . . . . 104
7.2.3 Factors Affecting Resilience . . . . . . . . . . . . . . . . . . 105
7.3 Prefix Hijack Incidents in the Internet . . . . . . . . . . . . . . . . . 108
7.3.1 Case I: Prefix Hijacks by AS-27506 . . . . . . . . . . . . . . 108
7.3.2 Case II: Prefix Hijacks by AS-9121 . . . . . . . . . . . . . . 111
7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.1 Visualization of Route Dynamics . . . . . . . . . . . . . . . . . . . . 117
8.2 Understanding Routing Dynamics and Problem Inference . . . . . . . 118
8.3 Prefix Hijack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
9.1 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
9.1.1 Understanding AS activity . . . . . . . . . . . . . . . . . . . 123
9.1.2 Prefix hijack . . . . . . . . . . . . . . . . . . . . . . . . . . 124
9.1.3 BGP monitoring . . . . . . . . . . . . . . . . . . . . . . . . 124
vii
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
viii
LIST OF FIGURES
1.1 Internet routing and BGP monitoring . . . . . . . . . . . . . . . . . . 4
2.1 Possible e-sets are {(1, 2)} and {(2, 4), (2, 5), (2, 6)} . . . . . . . . . 10
2.2 Example for nearest descendent and minimum e-set from Algorithm 2 14
2.3 greedy Fault() with each node as a destination . . . . . . . . . . . . . 20
2.4 Example showing greedy not optimal in general case . . . . . . . . . 21
3.1 The notion of link weight . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Rank-change graph for change in Figure 3.1 . . . . . . . . . . . . . . 27
4.1 Components of Link-Rank . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Sample Rank-change graph . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Plotting an activity bar . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 Use of time window to control time of change . . . . . . . . . . . . . 36
4.5 Drilling down to increase level of detail in activity . . . . . . . . . . . 37
4.6 Assembling views from AS 11608 and AS 3561 . . . . . . . . . . . . 39
4.7 Activity plots from March 8, 2005 to March 14, 2005 . . . . . . . . . 41
4.8 One hour of activity plot from 12.0.1.63 on March 9, 2005 . . . . . . 41
4.9 Case I: Continuous switching of routes between two links . . . . . . . 42
4.10 Activity plots from October 18, 2005 to October 24, 2005 . . . . . . . 44
4.11 Case II: Instability observed at AS 6453 . . . . . . . . . . . . . . . . 44
4.12 Case II: Combined view from AS 1239, AS 6453 and AS 3257 . . . . 44
ix
5.1 Frequency distribution of link weight values of 3 links from a single
observation point . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 Percentage samples covered by the top 5 most common values see in
the 4500 link weight samples for each link . . . . . . . . . . . . . . . 49
5.3 Example showing choice of α = 0.1 . . . . . . . . . . . . . . . . . . 50
5.4 Events in January 2007 . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.5 Main steps in inference . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.6 A fault graph from single observation point . . . . . . . . . . . . . . 55
5.7 Augmenting a fault graph with additional information . . . . . . . . . 55
5.8 State transition diagram . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.9 BGP peer TENET (AS 2018) of Abilene (AS 11537) was unreachable.
Event observed from primary view of AS 11686. . . . . . . . . . . . 60
5.10 Number of origin-adjacent events affecting each observation point . . 62
5.11 Accuracy of Fsingle and Fmult for origin events involving more than 50
prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.12 Links involved in events per observation point . . . . . . . . . . . . . 64
5.13 Cumulative Distribution of instances per link from AS 2914 . . . . . 65
5.14 Repeated instability involving AS 2072 as viewed from AS 2914. . . . 65
5.15 Case study: Routing changes seen from AS 2914 . . . . . . . . . . . 66
6.1 Example of prefix hijack . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 Components of PHAS . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Origin events per prefix - December 2005 . . . . . . . . . . . . . . . 78
6.4 Inter-arrival time between origin events for a prefix for December 2005 79
x
6.5 Distribution of origin events per prefix using adaptive window . . . . 82
6.6 Comparison of origin events per day using instantaneous and adaptive
window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.7 Notification setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.8 Origin events per day from June 1, 2005 to August 31, 2005 . . . . . 89
6.9 Origin events per day from September 1, 2005 to November 30, 2005 90
6.10 Distribution of events per AS for December 2005 . . . . . . . . . . . 91
6.11 Delivery Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.12 Delivery Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.13 Types of prefix hijacks . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.1 Hijack scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.2 Distribution of node resilience. . . . . . . . . . . . . . . . . . . . . . 104
7.3 Resilience of nodes in different tiers. . . . . . . . . . . . . . . . . . . 105
7.4 Understanding resilience of tier-1 nodes . . . . . . . . . . . . . . . . 107
7.5 Resilience of nodes with different number of Tier-1 providers. . . . . 108
7.6 High resiliency against hijack . . . . . . . . . . . . . . . . . . . . . . 110
7.7 Low resiliency against hijack . . . . . . . . . . . . . . . . . . . . . . 111
7.8 Tier-1 prefix hijacked . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.9 Multi-homed customer of tier-1s hijacked . . . . . . . . . . . . . . . 114
xi
ACKNOWLEDGMENTS
First and foremost, I would like to acknowledge my dissertation advisor Dr. Lixia
Zhang who has constantly guided me through my dissertation. I would also like to
acknowledge Dr. Dan Massey for his guidance and support during my dissertation,
and Dr. Songwu Lu for his feedback on my work from time to time. I am also grateful
to Dr. Adam Meyerson for contributing towards the work on minimum failures in
routing. I would like to extend a special note of thanks to Verra Morgan for her time
and support during my Ph.D. Finally, various friends and colleagues have played an
important role during my Ph.D., notable among them are Dr. Akash Nanavati, Dr. Dan
Pei, Dr. Beichuan Zhang, Dr. Vasilis Pappas, Ricardo Oliveira, Eric Osterweil, Yang
Yi, Soshant Bali, and Yuan-Chin Amy Lee.
My research has been complemented with deployed systems and this would not
have been possible without the work by fellow students. I would like to thank all the
students who have contributed to the Link-Rank visualization tool. Tim Ma worked on
the very first version of Link-Rank and was also responsible for the web based services.
Yiguo Wu carried on the work and added more features to Link-Rank. Jeffrey Chiang
and Jonathan Salehpour helped in redesigning the tool and were instrumental in the
release of version 1.0. I would also like to thank Yan Chen for converting a prototype
of PHAS into full fledged deployed system.
Finally, I would like to acknowledge NSF and DARPA for their research grants
under which I was supported.
xii
VITA
1978 Born, Mumbai, India.
2000 B.E. (Computer Engineering), Mumbai University, India.
2000–2001 Research Assistant Programmer, Computer Science and Engineer-
ing Department, Indian Institute of Technology, Mumbai.
2001–2003 Teaching Assistant, Computer Science Department, UCLA. Taught
sections of upper division undergraduate courses in Software Engi-
neering, Database Systems and Computer Networks
2002,2003 Visiting Research Assistant, USC Information Sciences Institute,
Arlington, VA, USA.
2003 M.S. (Computer Science), UCLA.
2003-present Research Assistant, Computer Science Department, UCLA.
PUBLICATIONS
Mohit Lad, Xiaoliang Zhao, Beichuan Zhang, Dan Massey and Lixia Zhang, An Anal-
ysis of BGP Update Burst during Slammer Attack , in Proceedings of the 5th Interna-
tional Workshop on Distributed Computing (IWDC), Dec 2003.
—, Akash Nanavati, Dan Massey and Lixia Zhang, An Algorithmic Approach to Iden-
xiii
tifying Link Failures , in Proceedings of the 10th Pacific Rim International Symposium
on Dependable Computing (PRDC). March 2004.
—, Dan Massey, Adam Meyerson, Akash Nanavati and Lixia Zhang, Minimum failure
explanations for path vector routing, in Journal of Combinatorial Optimization, special
issue on Communication Networks and Internet Applications, September 2006.
—, Dan Massey, Dan Pei, Yiguo Wu, Beichuan Zhang and Lixia Zhang, PHAS:
A prefix hijack alert system, in Proceedings of 15th USENIX Security Symposium
(USENIX Security 2006)
—, Dan Massey and Lixia Zhang. Visualizing Internet Routing Dynamics. IEEE
Transactions on Visualization and Computer Graphics, Nov/Dec 2006.
—, Ricardo Oliveira, Dan Massey and Lixia Zhang, A link weight based approach for
root cause inference in Internet Routing , UCLA Technical Report 060028, January
2007
—, Ricardo Oliveira, Beichuan Zhang and Lixia Zhang , Understanding Resiliency of
Internet Topology Against Prefix Hijack Attacks, to appear in IEEE/IFIP DSN 2007
xiv
ABSTRACT OF THE DISSERTATION
Understanding and Diagnosing Routing Dynamics inGlobal Internet
by
Mohit Vijay LadDoctor of Philosophy in Computer Science
University of California, Los Angeles, 2007
Professor Lixia Zhang, Chair
The global routing system is a critical component in the Internet infrastructure.
It delivers data to over 210,000 destination networks throughout the Internet. Large
scale events such as fiber cuts, power failures, or major changes in network connec-
tivity often lead to large scale routing changes, which in turn can cause widespread
disruption in data delivery service. Hence it is critically important to be able to de-
tect all significant routing changes and to identify the origins of these changes. Prior
works focused on examining the route changes to individual destinations in an attempt
to understand the behavior of the global routing system. Unfortunately the Internet’s
distributed nature and its sheer size makes such approaches infeasible and ineffective.
In this work, we develop a new concept in measuring routing distribution and dy-
namics. Instead of measuring changes to individual routes, we measure the total num-
ber of routes carried over each link, dubbed link weight. By examining the changes to
link weight, dubbed rank-change, we easily capture the aggregate route changes. We
use link-weights and rank-changes to visually capture large-scale routing events from
hundreds of megabytes of routing data collected from operational routers. In addition
to enable visual analysis of routing problems, the link weight metric also forms the ba-
xv
sis for automated inference to locate the origins of routing changes. We correlate link
weight changes across adjacent links and across observations from different vantage
points to construct an s-t graph, called a fault graph, which contains a virtual source
and sink node. The min-cut that disconnect the source and the sink by cutting the
least number of edges is the most likely solution where the problem originated. Our
evaluations show that this min-cut heuristic can identify the problem edges with a high
accuracy.
Another problem facing the Internet today is prefix hijacking where a destination
wrongly announces an IP address space it does not own causing routers to send traffic
to itself instead of the genuine destination. Prefix hijacking is a major threat to Internet
security and detecting a hijack early is important to reduce damage done. To this
end we designed a lightweight and easily deployable hijack alert system. We also
carried out a systematic evaluation of how impact of a hijack varies based on location
of attacker and target and found out that the tier-1 ISPs having the highest degree are
much more vulnerable than some of their multi-homed customers.
In summary, our definitions of link weight and rank-change contribute a new ab-
straction to measure Internet routing distribution and dynamics. This new abstraction
not only enables a comprehensive visualization of the global routing system and auto-
matic diagnosis, but also opens up new venues for more advanced routing dynamics
modeling and analysis. Our work on prefix hijack presents a quick and easy to de-
ploy first step of detection of attacks and also offers new insights into who is more
vulnerable against hijacks in the Internet.
xvi
CHAPTER 1
Introduction
Today’s Internet provides a global data delivery service to millions of end users. Net-
work routing protocols play a critical role in this delivery service by steering data traffic
towards their destinations. Routing problems can potentially lead to long latencies for
data delivery or in some cases even loss of data packets, resulting in a noticeable per-
formance degradation for end users. In the past, large-scale events such as fiber cuts,
power failures or major changes in connectivity have severely impacted the data deliv-
ery service. Besides performance degradation, data packets can also be wrongly routed
to malicious entities resulting in serious security and privacy breaches. This hijacking
of traffic often occurs due to problems at the routing level. Thus detecting significant
routing problems in a timely manner is critical to ensure the smooth operation of the
Internet, and is the focus of this dissertation. In this chapter, we introduce Internet
routing in detail and elaborate on the routing problems we are interested in detecting
and understanding. We explain why these problems are challenging and discuss at a
high level how these problems are attacked in the remainder of the dissertation.
1.1 Internet Routing and Border Gateway Protocol
The Internet consists of a large number of networks called autonomous systems (AS).
Each AS is assigned an AS number and contains one or multiple destination networks.
Each destination network is represented by an IP address prefix. For example, the
1
prefix 131.179.96.0/24 represents a network at UCLA and is part of AS 52 (UCLA’s
AS number). As of March 2007, the Internet consists of over 20,000 autonomous
systems and over 220,000 prefixes.
A routing protocol propagates the information about how to reach all the destina-
tions, throughout the network. A path vector protocol called Border Gateway Protocol
(BGP) [36] is the de-factor routing protocol used between autonomous systems in the
Internet today. Routing information in BGP is propagated by the exchange of BGP
update messages. A BGP update message contains information about the destination
prefix and the AS path used to reach that prefix. We represent a BGP update in the
form {〈prefix〉 : 〈ASpath〉}. Figure 1.1 shows how BGP updates propagate routing
information in the Internet. In this figure, AS 22 owns a prefix P1 and sends a BGP
update message {P1 : 22} to its neighbor AS 33. AS 22 is said to be the origin AS
for prefix P1. On receiving this update, AS 33 now prepends its own AS number to
the received path and sends the BGP update {P1 : 33, 22} to its neighbors, AS 44 and
AS 55. 1 AS 55 in turn sends the BGP update {P1 : 55, 33, 22} to its neighbor AS
44. Note, AS 44 receives two paths to reach P1. When an AS receives more than one
path to reach a prefix, it chooses one of them as the primary path. In Figure 1.1, we
assume AS 44 picks the path {P1 : 33, 22} because it is shorter. Generally speaking,
this decision on which path to pick is based on the routing policy of each individual
AS. An AS’s routing policy also determines whether to send a particular path to a
neighbor. Besides initial route propagation, physical events like link failures can also
trigger BGP updates. For example, assume the link (44, 33) goes down. As as result,
AS 44 switches to a backup path {55, 33, 22} that it had learnt earlier and sends the
BGP update {P1 : 55, 33, 22} to its neighbors.
1An AS may contain more than one BGP router as shown in Figure 1.1 (e.g. AS 33 contains 3 BGProuters) and routing information inside an AS is propagated using an intra-domain routing protocol. Inthis work, we focus on inter-domain routing dynamics and hence do not go into details of intra-domainrouting.
2
1.2 BGP monitoring
Internet routing is very dynamic and there are continuous routing changes going on
in the Internet. For example, a typical BGP router can see as much as 400,000 BGP
updates per day on average2. Since BGP updates propagate routing information in the
Internet, capturing the BGP updates at various parts of the Internet can give us useful
insight of the state of the Internet and amount of routing changes going on in the Inter-
net. In Figure 1.1, AS 44 is connected to a routing update collection box that receives
BGP updates from AS 44. This collection box represents the data collectors of BGP
monitoring projects such as RouteViews [40] and RIPE [38]. We call an AS connect-
ing to such a collection box as an observation point. These monitoring projects collect
BGP updates from various observation points (operational routers in autonomous sys-
tems) around the globe and make the data available to the public. Though individual
networks have access to their own BGP updates, this data can be used by network op-
erators for problem diagnosis by relating what they see to what others in the Internet
see. Further, this data is an invaluable source of information to researchers on the oper-
ation of BGP since it contains actual BGP update messages collected at different parts
of the Internet. Research in Internet routing has studied this data to identify unknown
problems as well as understand known problems better. In the rest of this work, we
repeatedly use this data source to not only understand and diagnose problems in the
Internet but also use the data to carry out realistic evaluation of designed schemes.
1.3 Problems and Scope of Thesis
We now present a high level overview of the problems we tackle in this work and
explain the organization of the thesis.
2Observed over a period of March 2007
3
AS 22
AS 33
AS 44
AS 55
Collection box
P1P1: 22
P1: 33 22
P1: 44 33 22
P1: 33
22
Monitored Router
P1: 5
5 33
22
Figure 1.1: Internet routing and BGP monitoring
1.3.1 Inferring origin of routing changes
Routing dynamics can degrade data delivery and in the event of large scale routing dy-
namics, knowing the identify of the problem AS can help the operator change routing
policies of his own AS to bypass the problem AS as much as possible. In addition, an
operator can also contact the AS in question and inform them of the observed prob-
lems. BGP is a path vector protocol and paths included in routing updates contain
complete AS hop information. Thus by looking at the previous and new path, one may
hope to infer where the problem originated. In Chapter 2, we discuss the problem of
inferring location of change given set of paths before and after an event. We consider a
simplistic network model and see that exact inference is not always possible and often
more than one explanations can be found for an observed change. We further show
that even inferring the minimum size explanation set is NP hard in the general case.
We also prove that it can be optimally solved under certain situations. Our results in
Chapter 2 show that problem inference even in an ideal network setting is very diffi-
cult. Our ultimate goal is to be able to infer origin of routing changes observed in a
much more complex setting of the Internet.
4
1.3.2 Understanding routing changes in the Internet
With over 210,000 destinations, Internet routing is very dynamic and BGP routers
receive lots of BGP updates. At this scale, routing dynamics are expected, but sepa-
rating routine routing changes from more serious ones is a challenging task. Further,
in the aftermath of a known big event like fiber cuts or major link or AS problems,
network operators would like to know how badly their own routing was affected. As
an example, an undersea fiber cut resulting from an earthquake off the coast of Tai-
wan caused widespread routing dynamics. Network operators would be interested in
understanding what was the impact of this event on their own as well as global Inter-
net routing. Researchers would also benefit from understanding how BGP performs
under normal operations as well as under stress events like the post-earthquake pe-
riod. However, studies on BGP dynamics have focussed on studying end to end routes
to individual destinations and using this technique simply does not scale when trying
to understand large scale routing events and their impact. Multiple routes may share
common links and what is needed is some intrinsic metric that will reflect aggregate
behavior of groups of routes that change in similar fashion. Chapter 3 introduces our
notion of capturing aggregate behavior by assigning a weight to each link based on the
number of routes carried over it. This notion of link-weight is a very important concept
that allows one to analyze Internet routing from a completely new perspective of link-
weight changes. In Chapter 4 we show how this metric can be used to construct visual
representation of large scale routing changes, enabling one to understand the impact
of routing changes seen as well as identify the origins of observed changes. Our main
challenge in visualizing routing dynamics involves scaling the visualization to the size
of the topology with over 20,000 autonomous systems and more than 80,000 AS links.
Using link weights and focussing on the heavy link weight changes provides a natu-
ral way to scale in this regard. The resulting visual representations can greatly help
5
network operators who struggle to mine meaningful data from gigabytes of routing
updates. With these visual graphs showing the major link weight changes, network
operators can understand the aggregate changes better. Using case studies we show
how our visualization can provide interesting insight into known routing events.
While the aim of Chapter 4 is to visually summarize large scale routing changes
to aid visual analysis, the aim of Chapter 5 is to automatically infer the origin of large
scale routing changes. We first observe that the link weights of most links are relatively
stable under normal operation. Hence by continuously monitoring link weights, we can
identify a routing event when the link weight of any link deviates significantly from its
typical value. Once a routing event is identified, we record the changes in link weights
and develop a heuristic to relate the link weight changes to each other in the form of an
s-t graph containing a source node s and a sink node t. The graph is so constructed that
any cut disconnecting s from t represents a set of links whose failure can explain the
observed changes. By adding views from other monitoring points as well, we show
that the minimum s-t cut is the most likely explanation of edges responsible for the
routing changes. Using known case studies, and evaluation we show the effectiveness
of the inference heuristic in identifying the origin of the routing changes. Overall, we
present a scheme that provides high accuracy from a small set of observation points
and is much faster and hence more practical than previous schemes based on prefix
level clustering.
1.3.3 Security Problems
Besides diagnosing routing problems that can impact data delivery performance, an-
other aspect of this work is to identify and diagnose routing problems that can lead to
security and privacy breaches. Prefix hijacking is one such routing problem where an
AS announces a destination IP prefix it does not own and can deceive other routers in
6
the Internet to route packets to itself. For example, lets say a malicious AS announces
an IP prefix belonging to a bank, and assume the BGP routers in AS X find the route to
the malicious AS more lucrative than the genuine AS and hence wrongly route packets
to the malicious AS. One effect of this could be that users in AS X will not be able
to access sites belonging to the bank, but a more severe effect could result if the ma-
licious AS hosts a web server on the exact same IP as the bank. In this case, users
in AS X would actually be sending private account information to the malicious AS,
thus compromising security of the accounts. Clearly prefix hijacking is a serious threat
to the Internet and needs to be detected as soon as possible. However a BGP router
cannot easily detect prefix hijacks, since BGP allows multiple autonomous systems
to announce the same prefix and this is often done for legitimate reasons. Chapter 6
discusses the challenges involved in designing such a detection system and presents
a scalable and easily deployable system design to carry out hijack detection in near
real-time. Our design is based on the idea that on seeing a new origin for a prefix, the
prefix owner is the best person to decide whether its prefix is being hijacked. Hence,
we design an alert system that quickly alerts prefix owners of potential hijacks. Our
system pushes the complexity to the end users, and hence does not suffer from the
problem of outdated data submitted by prefix owners.
Besides detecting and reacting to hijacks, it is important to understand the re-
siliency of the Internet topology against prefix hijacking. In particular, it is impor-
tant to understand which parts of the Internet are more robust against hijacks directed
towards them, and what factors drive the amount of damage a hijack can cause? Know-
ing this can enable customers to better decide which Internet Service Providers (ISPs)
to connect with for better security against hijacks. An understanding of the factors
influencing hijack can also enable us to understand the vulnerability better and attempt
to fix it. Chapter 7 presents a detailed study on these lines and finds that the ISPs with
the highest degrees (tier-1 ISPs) are often worse impacted than some smaller ISPs in
7
the event of a hijack. The implication of this study is multi-fold. First, in terms of hi-
jack resilience bigger is not necessarily bettter. Second, in order to secure the Internet,
it is not enough to secure the big ISPs since high impact attacks can be launched from
the edge as well.
8
CHAPTER 2
Inferring Failures in Path Vector Routing
Path vector routing protocols convey complete path information to reach a destination
node. These protocols can adapt dynamically to topological changes like link failures
but usually do not convey any explicit notification of which links failed. Without
any explicit failure notification, one can only hope to infer failed links based on the
complete path information before and after failures. For example, assume a node 1
can reach a destination node 4 using the the path 1 → 2 → 4. Now due to some link
failure, this path changes to 1 → 2 → 3 → 4. One can see that link (2, 4) must have
failed to cause A to switch to path 1→ 2→ 3→ 4. However, if the path to C changes
from 1 → 2 → 4 to 1 → 3 → 4, one cannot tell whether the link that failed was
(1, 2) or (2, 4). In cases where more than one failure scenario can explain the observed
routing changes, simply analyzing the path to a single destination is not enough.
Even after looking at how paths to all destinations are affected, one might face
multiple failures scenarios. For example, assume that the path to destination 5 also
changes from 1 → 2 → 5 to 1 → 3 → 5 at the same time as the path to destination
4 changes from 1 → 2 → 4 to 1 → 3 → 4 as shown in Figure 2.1. Given this
information of path changes to destinations 4 and 5, three candidate failure scenarios
are a) failure of link (1, 2), b) failures of links (2, 4) and (2, 5) and c) failures of links
(1, 2), (2, 4) and (2, 5). We call each of these scenarios as an explanation set or e-
set. Even in this simple case with one source and two destinations, we have three
e-sets and cannot say for sure which one is the cause. In situations where multiple
9
1
2
4 5
3
1
4 5
3
a. Initial tree T0 from node 1 b. Final Tree T1 from node 1
Figure 2.1: Possible e-sets are {(1, 2)} and {(2, 4), (2, 5), (2, 6)}
failure scenarios are possible, the minimum failure e-set problem involves identifying
the minimum number of links whose failure can explain the observed route changes.
Identifying minimum number of failed edges can give us a lower bound on the number
of failures causing route changes. For example, in Figure 2.1, we can say for sure
that at least one edge must have failed. In addition, if all links have the same failure
probability, then one can see that the failure of link (1, 2), i.e. the minimum e-set, is
also the most likely solution.
In this work, we first look at how to infer single failures from remote observations.
We then formalize the minimum e-set problem and show the problem is NP-complete
in the general case. We discuss conditions under which the minimum e-set can be
optimally found and present simple algorithms for these cases.
2.1 Model and Problem Definition
We now provide details about the network and routing model used in this chapter. We
also formally define the problem of minimum e-set in this section.
10
2.1.1 Network and Routing Model
We model the network as a simple directed connected graph G = (V,E), where V =
VD ∪ VN and E = ED ∪ EN . VD represents the set of destinations and the nodes
in VN are transit nodes. We are only interested in routes to VD. Nodes in VN are
connected by links in EN and each edge in EN has the form (a, b) where a, b ∈ VN .
The destinations are attached to the nodes in VN through edges in ED and each edge
in ED has the form [d, n] where d ∈ VD and n ∈ VN . In all figures, nodes in VD
(destinations) are represented by the presence of small solid rectangles while nodes
in VN (transit nodes) do not have these solid rectangles. For example, in Figure 2.1,
nodes 4, 5 and 3 belong to VD, while nodes 1, 2 belong to VN .
We use a Simple Path Vector Protocol (SPVP) [14]. In SPVP each node advertises
only its best path to reach the destinations. A path from node v to destination d is a
sequence of nodes pathv(d) = (vkvk−1 . . . v0d) where vk = v, (vi, vi−1) ∈ EN for
all 0 ≤ i ≤ k, and (v0, d) ∈ ED . We define pathLength(pathv(d)) = k + 1. After
receiving and storing a route learned from its neighbors, node v selects its best path to
destination d according to some routing policy.
The routing policy could be any policy that maintains the tree property. In other
words, the routing table containing routes to all destinations at any stage can be graph-
ically represented in the form of a tree. Thus, at any node u, there has to be only one
path to any other node v, even if v is a used as a transit node to reach some other
destination. In the remainder of the chapter, when we refer to ‘best’ path, we mean a
path that is deemed best given the routing policy. We abstract out the notion of ‘best’
path and separate it from its physical interpretation. For illustration purposes, we use
the shortest path routing policy where a node picks the path with the lowest number of
hops as its best path. In case of a tie for more than one paths with the lowest number of
hops, the node picks the path from the neighbor with the lowest numeric ID. While the
11
shortest path policy is used for illustration purposes, our algorithms and proofs work
with any policy that can maintain the tree property.
2.1.2 Failure Model
We allow any number of links to fail in the network. All links in the network have the
same failure probability. If a link fails, the nodes adjacent to the link detect the failure
and all nodes using the link must switch to alternate path (or declare the destination
unreachable). The link failures are not directly reported to any central database or
monitoring site. However, a node whose path is affected by this failure, will see a new
path implying that something went wrong on the initial tree. Different link failures may
impact different observation points and hence we are concerned with the minimum
failures as seen from a particular observation point only. Link failures are atomic,
which means if a link fails, no node can use it. We look at route changes observed
from a node to infer failed links. The route changes seen must have happened only due
to link failures and cannot happen due to the addition of new links.
2.1.3 Input Requirements
We do not make any assumption about the the topology of the entire graph being known
to any single node in the graph. We define an observation point as a node for which we
can see the complete routing table at any time. The input we have constitutes of two
routing table snapshots. We assume that the initial as well as final routing tables reflect
the steady state tables after the routes for all the nodes in the network have converged.
In other words, there are no more routing updates propagating in the network at the
time of the two routing table snapshots.
12
2.2 Minimum e-set for tree inputs
In this section, we present a heuristic to solve the tree version of the minimum e-set
problem optimally. The results in this section apply to any routing policy that always
produces a tree.
This heuristic uses the notion of nearest-descendant defined below. For any node
u in T0 ∩ T1, where M is the root of the trees, call node v a nearest-descendant of u if
1. v ∈ T0 and u is closer to M than v in T0
2. v is also in T1 or v ∈ VD, where VD is set of destinations and
3. for any x in T0 on the path from u to v, x /∈ T1
To find the nearest-descendant pairs, we use Algorithm 1. The nearest-descendant
algorithm uses depth first search (DFS) over T0 to find the nearest descendants of a
node u ∈ T0. Along any DFS path from u, if a node v is present in T1, then no descen-
dant of v needs to be explored, and v is the nearest descendant of u. Figure 2.2 shows
trees T0 and T1 observed from node 1. Node 1 has three children 2, 3 and 6. Node 2
is not present in T1 nor is a destination, hence 2 cannot be a descendent. Recursing
from node 2, we can find 4 and 5 to be in T1, and thus are nearest descendants of 2 and
hence of 1. Similarly, 3 is also a nearest descendant of 1. Finally, notice that 8 is not
in T1. This is possible if there is no valid path to reach 8 in T1. But, since 8 is a des-
tination, it is also a nearest-descendant of node 1. Summarizing, node 1 has 4 nearest
descendants in 3, 4, 5 and 8. Among, the other nodes, 2 has nearest descendants 4 and
5, while nodes 3, 4 and 5 do not have any nearest descendants.
Once, we find the set of all (u, v) pairs, such that v is a nearest-descendant of u,
we use Algorithm 2 to compute the minimum e-set. For each (u, v) pair, we check if
T1 would still be the computed tree if edges on the path (u, v) are added to T1. For
13
Algorithm 1: nearest descendant(Node u, Path ρ)Input: Node u in T0, Path ρOutput: P : Set of paths to nearest-descendants of uwhile there is some v, such that (u, v) ∈ T0 AND v is not visited do
Mark v as visited ;
Add (u, v) to path ρ;
if v ∈ T1 or v ∈ VD then
Add path ρ to P ;
else
nearest descendant(v,ρ);
Remove (u, v) from path ρ;
1
2
4 5
3
a. Initial tree T0 from node 1 b. Final Tree T1 from node 1
6
8
1
4 5
3
7
Figure 2.2: Example for nearest descendent and minimum e-set from Algorithm 2
14
example, consider the pair (1, 8) with path 1→ 6→ 8, we have
RouteCompute(1, T1 ∪ path(1, 8)) 6= T1
Thus, we fail the first link on the path 1 → 6 → 8. Similarly, edge (1, 2) is marked
failed for path between (1, 4) as well as (1, 5). On the other hand consider nearest
descendent pair (2, 4). For this pair we have,
RouteCompute(1, T1 ∪ path(2, 4)) = T1
and hence we do not fail any edge on this nearest descendant pair. After considering
all the (u, v) pairs, the minimum e-set in this case is {(1, 2), (1, 6)}. We now prove
correctness and optimality for Algorithm 2.
Algorithm 2: FindFault()Input: T0 and T1, the routing trees from vantage point MOutput: F : set of edges marked failed;for each pair nearest-descendent pair (u, v) with path P do
if (RouteCompute(M,T1 ∪ P )) 6= T1 then
/* One of the edges on the (u, v) path P must have failed. */
Mark the first edge on the (u, v) path in T0 as failed.
Lemma 0.1. The set of edges F thus determined comprises an e-set.
Proof. We prove this theorem using shortest path routing, but the proof can easily
apply to other tree based routing policies. Assume the contrapositive, in other words,
when these edges fail the resulting tree Tf on (T0 ∪ T1)− F is not the final tree T1. It
follows that there’s some (a, b) in T1 such that a shorter path exists in Tf . Since we do
not fail edges of T1, the path (a, b) can only be shorter. Let (a, b) be such a pair whose
real shortest path in T0 + T1 − F is as short as possible. Suppose the shortest path
between (a, b) includes some node x in T1. Then we must have either (a, x) or (x, b)
15
closer in T0 + T1 − F than they are in T1, thus contradicting the definition of (a, b).
It follows that the path between (a, b) does not include any other vertices of T1, and
therefore travels entirely though T0. In particular, (a, b) are nodes of T0 (and T1) and
the path between them does not contain any nodes of T1, so one is a nearest-descendant
of the other. It follows that we would have included an edge on the (a, b) path in F .
Theorem 1. The e-set F thus determined is minimum.
Proof. Consider all pairs (u, v) where v is a nearest-descendant of u. We know that
we must delete some edge on the path (u, v) in T0, otherwise T1 could not be a shortest
path tree. Note that if v is a nearest-descendant of u, then v cannot also be a nearest-
descendant of u′ for any u′ not equal to u. We can therefore imagine chopping the
trees into edge-disjoint subtrees, where each subtree consists of a node in both T0 and
T1 along with its nearest-descendants and the paths to them. For each subtree, we
consider breaking it up into further edge-disjoint subtrees via the edges from the root
(u). For each edge outgoing from u, if we cut that edge then any algorithm must cut
one of the edges in the relevant subtree. Since all subtrees are disjoint, it follows that
any algorithm must cut at least as many edges as ours cuts.
2.3 Minimum e-set for general graphs
So far, we discussed how to optimally solve the minimum e-set problem with the
nearest-descendent approach. In Section 2.2, we restricted the input to a node be
the routing trees from that node before and after the failures. However, nodes might
have additional information about the underlying topology besides its routing tree.
One source of information is the routes received from other nodes. For example, lets
16
assume a node A receives a route B → C → D to reach the destination D from
neighbor B. Node A also receives the route F → D to reach destination D from
neighbor F. Assume node A selects route F → D via neighbor F (assuming shortest
path), and does not use B to reach any other destination. Even though links B → C
and C → D may not be in A’s routing table, it knows these two links are up, else B
would have sent a new route to reach D. To account for this extra information that a
node might have, we generalize the problem to find minimum e-set when the inputs
before and after failures can be general graphs instead of a trees.
2.3.1 Problem Definition: Graph Version
At an observation point M , let T0 and T1 indicate the routing trees used by node M at
two different times t0 and t1 respectively. In addition, let G0 and G1 indicate the set
of links known to be up at t0 and t1 respectively. The set G0 could be just T0, but can
contain additional arbitrary edge information, and T0 ⊆ G0. Similarly, T1 ⊆ G1. We
have,
RouteCompute(M,G0 ∪G1) = T0 (2.1)
A set of links is called explanation set (e-set) iff:
RouteCompute(M, (G0 ∪G1)− F ) = T1 (2.2)
Minimum e-set problem: Given two graphs G0 and G1 from a node M containing
routing table links as well as other links, and a known routing policy, find the minimal
set of edges F , such that RouteCompute(M, (G0∪G1)−F ) = T1, where T1 represents
the routing table RouteCompute(M,G1).
While, we show the general case of this problem is NP-complete, a special case of
this problem with graph inputs can be solved optimally if each node is a destination.
17
2.3.2 The case of no purely transit nodes, VN = φ or V = VD
We present a greedy Algorithm 3 called greedy Fault() to find optimal solution for the
case where each node is a destination, or equivalently there are no transit nodes. This
algorithm removes one edge at a time with edges ordered in breadth first order.
Algorithm 3: greedy Fault(M,G0, G1)Input: M : the vantage point ;
G0: up edges at time t0;
G1: up edges at time t1;Output: e-set of candidate failures F1. Initialize Gf = G0 ∪G1;
2. Compute Tf = RouteCompute(M,Gf );
3. for edges (X, Y ) of Tf in BFS order do
if (X, Y ) /∈ G1 then
F = F ∪ {(X, Y )};
remove (X, Y ) from Gf ;
goto step 2;
4. return F ;
Figure 2.3 provides an example showing the output of the greedy algorithm on
a simple graph. For this example, we assume the shortest path routing policy with
lowest ID tie-breaking. In this example, each node is a destination. Figure 2.3a and
Figure 2.3b represent the initial and final graphs. Note, the link (2, 3) is not in T0 but
is present as additional information in the initial graph G0. In contrast the final graph
G1 does not contain any new information and hence is the same as T1. At each stage,
we superimpose the tree edges shown by directed edges over edges in the graph. After
removal of link (1, 2) from G, the link (3, 2) is now on the tree from node 1 as shown
in Figure 2.3d. It can be seen that link (3, 2) is not in the final tree and without its
18
failure, the path cannot change to use link (9, 2). Hence, (3, 2) is failed next to result
in the final tree T1.
We define ‘sufficient’ failure set as the links whose failure is enough to change the
tree from T0 to T1, and ‘necessary’ failure set as the links which must fail in order
to change the tree from T0 to T1. With each node as a destination our algorithm will
always output sufficient as well as necessary failures.
Theorem 2. Let F be the set of edges returned by greedy Fault(M,G0, G1). Then
F is both necessary and sufficient set of edges to explain the change from T0 to T1.
Proof. To show that F is sufficient, we will show that Tf = T1 when the algorithm
terminates. In the for loop, we remove edges only from G0 − G1. Thus at the end,
Tf =RouteCompute(M,G′) where G1 ⊆ G′. Also, the for loop removes edges from
Tf that are in G0−G1. Thus at the end, Tf does not include any edges from G0−G1.
Thus Tf = RouteCompute(M,G′) =RouteCompute(M,G1) = T1.
Now to show that F is necessary, let F ′ by any set of edges such that RouteCompute
((G0∪G1)−F ′) = T1. We will show that F ⊆ F ′, by induction on the iteration of the
for loop. Suppose by induction, at the end of the i-th iteration, e1, . . . , ei are the edges
selected by the algorithm, and by induction, F ′ includes these edges. We will show
that F ′ must include ei+1 = (u, v) selected by the algorithm in the (i+ 1)-st iteration.
Note that the algorithm chooses the first edge in the BFS order of Tf from M that is
not in G1. Consider the path P from M to u in Tf . All edges of P are available at time
t1 (they are in G1, so the algorithm did not select them). Hence if (u, v) did not fail
during [t0, t1], and since v ∈ VD, T1 would include (u, v). It follows that any e-set F ′
must also include (u, v) in order to be a valid explanation-set.
19
1
2
4 5 6
3
1
2
4 5 6
3
a. G0 = T0 + edge (2,3) b. G1 = Final Tree T1
c. G=G0+G1
d. Remove link (1,2)
Link in tree from node 1
Link removed by greedy
9 9
1
2
4 5 6
39
1
2
4 5 6
39F
e. Remove link (2,3) to give T1
1
2
4 5 6
39F
F
edge in the graph
Figure 2.3: greedy Fault() with each node as a destination
20
a. G0=Initial Tree T0 + {(2,3),(3,5)}
1
2
4
8
3
5
1
3
5
8
b. G1=Final Tree T1
1
2
4
8
3
5
F
F
e. Mark (1,2) and (3,4) as failed
Figure 2.4: Example showing greedy not optimal in general case
2.3.3 The case of purely transit nodes, VN 6= φ
The Algorithm 3 from the previous section does not produce an optimal solution if
the network contains purely transit nodes. Consider Figure 2.4 as an example where
we see routes from node 1 to a single destination node 8, and nodes 2, 3, 4, 5 are just
transit nodes. Parts 2.4a and 2.4b show the initial and final graphs. The edges with
arrows represent the routing table, while the others represent information about edges
through other means. We assume the network uses shortest path routing with lowest
ID tie breaking for this example. The greedy Algorithm 3 would first mark edge (1, 2)
failed, which would shift the route to 1 → 3 → 4 → 8, since 4 has a lower ID than
5 and both the path via 4 and 5 are of length 3. The greedy algorithm would then
mark edge (3, 4) failed resulting in the final tree. But clearly in this case, the e-set of
{(4, 8)} can explain the above change and is also minimal. We now prove that this
general version is NP-complete.
To show that finding minimum e-set in the general case is NP-complete, we will
consider a restriction of it, Directed Path Change (DPC), and prove it is NP-complete.
In DPC, we restrict the number of destinations to one, so that each routing graph is a
single path. But we allow arbitrary G0 and G1. Since DPC involves change of a single
21
path instead of multiple paths in our general problem, it is thus a subset of the general
case and proving hardness for DPC will imply hardness for the general problem of
multiple destinations. We use shortest path policy for this proof.
Directed Path Change (DPC): Given directed graphs G0, G1, two special nodes
s, t, and an integer k ≥ 0, let Pi denote the minimum hop s → t path in Gi such
that P0 is also the minimum hop s → t path in G0 + G1. Does there exist a set of
edges F ⊆ G0 − G1 such that |F | ≤ k and P1 is also a minimum hop s → t path in
G0 +G1 − F ?
We reduce following problem, UMC, proven to be NP-complete by [8].
Undirected Multiway Cut (UMC): Given undirected graph G, three special nodes
x, y, z and an integer k ≥ 0, is there a set of edges F , |F | ≤ k such that in G − F ,
x, y, z are disconnected from each other?
Reduction: Given an instance of UMC(G, x, y, z, k), we create a new directed graph
G′ as follows. The vertices of G′ are the vertices of G plus many additional vertices.
The edges of G′ are defined as follows: for every (undirected) edge (u, v) of G, we
put following gadget:1 put two new vertices w1, w2 (new for each different edge) and
add edges (u → w1), (w2 → u), (v → w1), (w2 → v) and (w1 → w2). Now any
path in G that goes u → v, can be simulated in G′ as u → w1 → w2 → v and any
path in G that goes v → u can be simulated in G′ as v → w1 → w2 → u. We also
add two special vertices s, t in G′ and add edges (s → x), (z → t). Let n denote the
number of vertices in G. Take two long paths Q1, Q2 of length D � 4n consisting of
new vertices, where Q1 is x → y and Q2 is y → z. Let P0 be the minimum hop path
1This gadget is also used by [12].
22
s → t in G′ and let P1 be an s → t path by following s → xQ1→ y
Q2→ z → t. This
(G0 = G′, G1 = P1, s, t, P0, P1, k) is our instance of DPC.
We shall show that there exists a UMC solution F iff it is also a DPC solution.
Let F be a solution to UMC instance (G, x, y, z, k). Let F ′ be the set of edges in G′
where for each edge (u, v) in F , we put the corresponding edge (w1, w2) in F ′. If we
remove F ′ and P1 from G0 = G′, then x, y, z are disconnected. So the minimum hop
s → t path in G0 − F ′ is P1, and hence F ′ is a solution to DPC (G0 = G′, G1 =
P1, s, t, P0, P1, k).
Conversely, let F ′ be a solution to (G0 = G′, G1 = P1, s, t, P0, P1, k). For all the
edges picked up from a single (u, v) gadget, we can replace them by (w1, w2) and get
a new solution F ′′, |F ′′| ≤ |F ′|. We claim that G′ − F ′′ − P1 has x, y, z disconnected.
If there was a path between any pair, say Q : z → y, then by symmetry2 there is also
a path Q̄ : y → z. Now any y → z path in G′ − F ′′ − P1 has a corresponding path in
G of length at most n − 1, and by our gadget the pathlength multiplies by 3 in G′, so
|Q| = |Q̄| ≤ 3(n−1). Then the path s→ xQ1→ y
Q̄→ z → t is available inG′−F ′′, and
is of length at most 1 +D+ 3(n− 1) + 1 which is less than the length of P1 = 2 + 2D
(recall D � 4n). This contradicts the claim that minimum hop path in G′ − F ′′ is P1.
Therefore F ′′ disconnects x, y, z and by choosing corresponding (u, v) edges, we get
F , |F | ≤ k, such that deleting F disconnects x, y, z in G.
Thus we have shown,
Lemma 2.1. DPC is NP-hard.
Since DPC is a special case of finding minimum e-set, and given any candidate
solution e-set, it is can clearly be verified in polynomial time, we get,
Theorem 3. Finding minimum e-set is NP-complete.
2If the original path traverses v → w1 → w2 → u, then we reverse it to u → w1 → w2 → v sinceour gadget has all these edges.
23
CHAPTER 3
Identifying Routing Problems in the Internet
In Chapter 2 we examined the problem of inferring location of failures in a simple path
vector routing framework. In this chapter, we examine the domain of Internet routing
in detail and understand the main challenges that make inference of problem areas
difficult in reality. We also explain how we take a fundamentally different approach
than prior works to analyze Internet routing problems.
3.1 Challenges in diagnosing Internet routing problems
There are various factors in Internet routing diagnosis that make it more challenging
than the ideal scenarios considered in Chapter 2. Primary among them is the lack of
knowledge of the complete topology. Another challenge in diagnosing Internet routing
problems is that a BGP router is unaware of how BGP routers in other autonomous
systems pick their best paths. In 2, our model assumed nodes choose best paths based
on shortest path routing with lowest ID tie breaker. However, in Internet commercial
policies and agreements drive routing decisions, even though shortest paths are often
picked after policy decisions are applied. The routing table from any given router may
not necessarily be a tree, and a node may be reachable via different routes for different
prefixes. Since the tree property is often violated and route preferences of other BGP
routers are unknown, it is difficult to apply a greedy scheme like Algorithm 3 on
page 18. Another factor complicating the inference is the presence of multiple BGP
24
peering sessions among large autonomous systems. In other words, even though a
BGP path may show a single link A-B between two autonomous systems A and B,
in reality they may connect at multiple physical locations. When one of the physical
connection breaks down, some prefixes may be affected and change routes, but others
may still continue to use the link A-B.
Besides the above factor the constant churn of updates in the Internet also presents
a challenge. Internet routing is very dynamic and at any given point of time, there are
continuous routing changes going on. It is thus difficult to group routing changes into
events and mark the start and end of the event. What we need to diagnose problems at
the global Internet scale is a metric that will capture aggregate routing behavior.
3.2 Capturing aggregate behavior: Notion of Link-weight
Internet routing is a big system with over 210,000 prefixes. Prior works in diagnosing
BGP problems have examined individual routes and attempted to correlate changes
across different prefixes. We take a fundamentally different approach to understand
Internet routing. We are interested in capturing some notion of aggregate routing be-
havior. We observe that different routes go through common AS links and the com-
monality in behavior can be captured by examining links instead of individual prefixes.
A link is weighed by the number of routes using that link from any given observation
point.
To explain this concept, we use a simple example shown in Figure 3.1. This figure
depicts the routing table seen by a router in AS 44 in Figure 1.1a, in the form of a
graph. Here we assume the existence of two more prefixes P2 and P3 announced by
AS 33 and AS 55 respectively. In Figure 3.1a, link (44, 33) has a weight of 2, since
that link appears twice in the routing table at AS 44. We denote the link weight by
25
44
55
33
22
P2
P1P3
P2: 44-33 P3: 44-55
2
1
1
P2: 44-33 P3: 44-33-55
P1: 44-33-22 P1: 44-33-22
P3: W
a. Link Weight seen by 44
b. Link Weight seen by 44 after 55 withdraws route to
P3.
44
55
33
22
P2
P1P3
3
11
Figure 3.1: The notion of link weight
wt(〈link〉, 〈observationpoint〉, e.g. wt((44, 33), 44) = 2. If BGP updates received at
AS 44 change the routing table at AS 44, the Link-Rank graph will also change. In
Figure 3.1b, AS 55 withdraws its route to P3, and as a result of this withdraw message,
AS 44 shifts to an alternate path to reach P3. The weight of link (44, 33) has now
increased from 2 to 3. Instead of viewing the routing change as a change in path, we
understand the overall change seen by AS 44 as change in its link weights over time.
To understand BGP dynamics using link weight metric, we need to understand
how links change weights as a result of the BGP updates. As a first step, we looked
at BGP updates over a period of one week and marked the links changing rank after
each BGP update. We found that the weight changes usually occurred in bursts. As a
result, instead of looking at the weights after each BGP update, we could analyze just
two snapshots. As a first step towards understanding routing dynamics, we visualized
links whose weights change significantly. Since we assign a measure of importance
or rank to a link using the number of routes carried, we call a visual representation of
the link weights as a Link-Rank graph and a representation of the changes in weights
as a Rank-Change graph. Rank-change graphs capture these links whose weights have
changed.
26
44
55
33-1
+1
+1
44
55
330(-1)
3(+1)
1(+1)
a. rank changes only b. link rank and rank change
Figure 3.2: Rank-change graph for change in Figure 3.1
A Rank-change graph takes the difference between two Link-Rank graphs and uses
red (or dashed) edges to mark the links that have lost routes and green (or solid) edges
to mark links that have gained routes. Simply stated, given two Link-Rank graphs from
G1 and G2 at different times t1 and t2 respectively, a Rank-change graph plots all links
(a, b) where the weight on these links wt((a, b), G1)−wt((a, b), G2) 6= 0. Figure 3.2a
shows the Rank-change graph for the routing change in Figure 3.1. From this figure,
one can clearly see that link (44, 55) lost 1 route, while the link (44, 33) and (33, 55)
gained a route. Note, the Rank-change graph does not show links that have not gained
or lost routes, e.g. link (33, 22). The edge label in a Rank-change graph can show
just the weight changes or the current link weight followed by the weight change in
parenthesis as shown in Figure 3.2.
We employ the metric of link weights for two purposes, visually understanding
large scale changes and inferring the origin of large scale changes. Chapter 4 explains
how link-weight changes in the form of Rank-Change graphs can be used to visualize
large-scale routing changes. The techniques in Chapter 4 can help network opera-
tors identify major changes among tons of BGP updates, as well as understand how
they were impacted by major routing events. Besides visually inspecting large scale
changes, we are also interested in automatically inferring the origin of the change. In
Chapter 5, we use the concept of link weight and present a heuristic that can accurately
27
infer the origin of the change.
28
CHAPTER 4
Visualizing Internet Routing Dynamics using
Link-Rank
When trying to understand Internet routing behavior, one is faced with hundreds of
megabytes of BGP update data and identifying interesting activity involves manually
sifting through this large amount of data. To make this task easier, visualization has
been proposed to ease the task of analysis. BGPlay [5] is a tool that visualizes route
changes from different monitors in the Internet to a single prefix. In this chapter, we
explore the use of visualization for understanding aggregate routing dynamics. We are
interested in understanding how visualization can help understand what exactly hap-
pened during and after large scale routing events like fiber cuts or major link failures.
We take a different approach from other work like [5], and instead of examining pre-
fix level routing changes, we use the link weight notion explained in Section 3.2 of
Chapter 3. In this chapter, we describe the Link-Rank visualization tool built on the
notion of visualizing link weight changes. Using case studies, we show how the above
features provided by Link-Rank can help network operators mine and understand in-
teresting routing changes from gigabytes of routing data.
29
4.1 Components of Link-Rank
The three components of the Link-Rank tool are shown in Figure 4.1. An important
component is the input filter block that controls when the Rank-change graphs are
constructed. In Figure 3.2, we saw the Rank-change graph for a single route change.
In reality, input filters are needed to enable Link-Rank to scale in regard to topology
size and number of BGP updates. One input filter involves picking a specific set of
prefixes and examining the routing changes for these prefixes. Another input filter is a
threshold based scheme and is the filter used in all our case studies explained later in
this chapter. In this threshold based scheme, we maintain the instantaneous link weight
for each link in the topology seen by an observation point. In addition, we maintain
the change in weight since the last Rank-change graph was generated. The link weight
as well as the change in weight is updated for all links affected by each BGP update
message. A Rank-change graph is generated when the weight of any link changes by
more than a preset threshold (default is 50). A detailed treatment of this scheme and
numerical results of the effect of threshold is beyond the scope of this chapter and the
interested reader may find more details in [25].
Using the threshold filter with BGP updates, a single routing event may be broken
into multiple Rank-change graphs. For example, assume a link (A,B) fails and 5000
routes using that link are affected. This will result in a burst of 5000 BGP updates
closely spaced in time, each of which reduces the rank of the link (A,B) by 1. Thus
the entire update burst would reduce the rank of (A,B) by 5000. If the threshold filter
generates a Rank-change graph each time the link weight changes by 50, there would
be as many as 100 Rank-change graphs, each with a change of 50 routes on link (A,B).
We employed a timing mechanism to reduce the number of Rank-change graphs due
to the same event. We observed that by delaying the construction of the Rank-change
graph by a short time, we could drastically reduce the number of Rank-change graphs
30
InputFilter
GraphGenerator
Output Filter
updatesraw
Rank-changegraph
Final Rank-Change
Graph
filteredupdates
Figure 4.1: Components of Link-Rank
for the same routing event. We call this time to delay construction of Rank-change
graph, as event timer and set its value to 30 seconds. During the event timer, if routing
changes add weight x to a link and immediately change back to reduce the weight on
that link by x, the net weight change would be 0 (termed as compensating change) and
hence no Rank-change graph will be generated (since weight change is below threshold
of 50). Our choice of 30 seconds for the event timer was motivated by the BGP timer
called MinRouteAdver timer explained in Section 1.1. With the MinRouteAdver timer
set to the recommended time of 30 seconds, compensating changes cannot happen at
a frequency less than 30 seconds. Though not all routers in the Internet are known to
use the MRAI timer, we found the event timer value of 30 seconds to be adequate.
The graph generator component outputs the Rank-change graph based on the up-
dates fed to it by the input filter. The output filter can control the links and nodes in the
Rank-change graph for brevity. Filter rules for the output could be simple weight based
rules such as ‘remove all links below a change of 10’ or more complex such as ‘show
graphs with at least one of the nodes 338, 55 AND links 44→33’. The output filter is
part of the visualization tool, and based on graph complexity, one can dynamically use
filter rules to simplify the graphs. Summarizing, the input filter prepares the data for
Rank-change graphs and the output filter can be used to prune the Rank-change graph
further.
31
4.2 Features of Link-Rank
We now present the visualization details of Link-Rank and discuss various features
used in Link-Rank to deal with large amounts of data.
4.2.1 Nodes, edges and color coding
We now discuss some details of visualization in Rank-change graphs. Figure 4.2 shows
an actual Rank-change graph from BGP data. Note, the Internet has over 20,000 au-
tonomous systems, and currently only a few hundred observation points are connected
to public data collectors. Observation points from where one can observe routing
changes, are shown as circular nodes to differentiate them from rectangular nodes that
are not observation points. Visually separating the observation points from the other
nodes, clearly highlights other possible view points that can be used to better under-
stand the same time interval. The observation point of the Rank-change graph (AS
6453) is colored blue to differentiate from other observation points that are colored
orange.
Edges in Link-Rank are primarily red or green in color. An edge is colored red
when it loses routes and green when it gains routes. To help users with difficulty to
distinguish between certain colors, Rank-change graphs can also be displayed using
dashed and solid lines to indicate loss and gain, instead of red and green. In addition,
this representation is very useful in the process of assembling multiple views explained
in Section 4.2.5. The thickness of the edges in the Rank-change graph represents the
magnitude of weight change. With links of varying thickness, one can easily spot links
with high losses or gains. In addition to varying the edge thickness, the size of the
nodes varies based on the amount of weight change of edges and the number of such
edges adjacent to it. This scaling of nodes helps to identify ASes with high routing
32
Figure 4.2: Sample Rank-change graph
activity.
We use the JUNG visualization library [28] to construct the Rank-change graph.
Link-Rank uses the spring layout implementation from the JUNG library, which gives
satisfactory results in general. Furthermore, the layout implementation also allows
one to manually reposition any node as needed for clearer view. In most cases when
the Rank-change graphs were sparse, the users of Link-Rank were satisfied with the
default layout. With denser graphs, the users tended to reposition some nodes.
4.2.2 Activity plots: summarizing weight changes
Activity plots summarize routing changes represented by Rank-change graphs along
the time dimension. An activity plot is a series of red and green bars on alternate sides
of a horizontal axis of time. With an activity plot, a user can identify time periods of
33
high routing activity and then investigate those specific periods in more detail. We first
explain how a single activity bar is plotted. Figure 4.3 shows a Rank-change graph
similar to Figure 3.1. Given a Rank-change graph, we first find the total gain and
total loss by adding the weight changes of the green and red links respectively. In this
case, the total rank gain is 200 (100 each on links (44, 33) and (33, 55)) and the total
rank loss is 100. We plot red and green bars proportional to the total loss and gain
respectively as shown in Figure 4.3. In this case, the green bar is longer than the red
bar. A higher gain (green) than loss (red) could be due to a combination of longer new
paths as in Figure 4.3 and new routes being announced.
Activity bars can provide summary information about the routing change. For
example, if we only see a red bar, it signifies that routes have been lost entirely and
this means some set of prefixes are not reachable. 1 In an activity plot, one activity
bar is constructed for each Rank-change graph over the duration of the activity plot.
The total magnitude of the activity bar could vary a lot depending on the type of event,
and we adjust the scale for the Y-axis, where the highest magnitude in any interval
coincides with the tallest bar on the activity plot and the remaining bars scaled linearly
relative to this. In Section 4.3, using case studies, we illustrate how activity plots can
help in the identification of routing problems.
4.2.3 Time Windows and Drilling Down
The time window control in Link-Rank allows users to aggregate Rank-change graphs
in a time interval. Due to the presence of slow convergence [15], some short lived
invalid paths could appear as genuine route changes. With the time-window control,
one can increase or decrease the longevity of weight changes that one wants to visual-
1There are cases where a red bar and absence of green bar may not reflect prefix loss. E.g. if thepaths for a set of prefixes change from A→ B → C to A→ B because the prefixes are now originatedby B, link (B, C) loses ranks but prefixes may still be reachable.
34
Sum of all gains= +200Sum of all losses= -100
200
100
44
55
33-100
+100
+100
Figure 4.3: Plotting an activity bar
ize. Figure 4.4a shows three activity bars corresponding to three Rank-change graphs
shown below. In Figure 4.4b we show the time window by rectangular boxes on the
activity plot. This time window can slide along the activity graph using DVD playback
like controls. In Figure 4.4b, we show how the Rank-change graph looks in three cases,
two involving the same time window size but different positions, and one involving an
even wider time window size. At each position of the time window, the Rank-change
graphs falling in that window are combined into one by taking the union of all the
Rank-change graphs. Equivalently, the Rank-change graph for a specific position of
the time window can also be constructed as a difference graph between the Link-Rank
graphs at the start and end of the time window. Note that within the first time win-
dow t1, the Rank-change graphs have some cancellation effect of route changes, i.e.
net weight change of (44, 55) is −100 + 50 = −50. In contrast, within the second
time window t2, the Rank-change graphs have an additive effect, i.e. weight change of
(44, 55) is 50 + 50 = 100. If the time window is increased to include all three activity
bars as in t3, then all the changes will be cancelled and the net Rank-change graph will
be empty.
Another time control, called the drill-down feature allows one to control the time
granularity of the entire activity plot. By drilling down, one can expand the activity
inside the current time-window to a larger time-span in a new window. The first part of
35
44
55
33-100
+100
+100
44
55
3344
55
33+50
-50
-50
44
55
33-50
+50
+50
a. Rank-change graphs
44
55
33+100
-100
-100
b. Rank-change graphs with different time-windows
Empty Rank-change graph
t1t2
t3
-50
-50
+50
Figure 4.4: Use of time window to control time of change
36
Jan 16, 2005 08:00 UTC to Jan 24, 2005, 16:00 UTC
Jan 22, 2005 23:10 UTC to Jan 23, 2005 15:57 UTC
Jan 23, 2005 07:53 UTC to Jan 23, 2005 09:53 UTC
Figure 4.5: Drilling down to increase level of detail in activity
Figure 4.5 shows an activity plot spanning over 8 days and time window of 16 hours.
To better understand the activity inside the time window, we drill down to expand the
16 hour time window to the activity time span in middle activity plot in Figure 4.5. The
time window in this case is about 2 hours. Drilling down further on this time window
will expand these two hours further as shown in the last activity plot. One can now see
the individual activity bars in detail compared to the first activity plot. Note, given an
activity plot, one can drill down to the granularity of the time equal to the event timer
explained in Section 4.1.
4.2.4 Pruning Rank-change graphs
Link-Rank processes BGP updates and visualizes the links that have changed. In all
the examples in this chapter, the underlying network consists of the Internet with about
20,000 nodes. However, the size of the Rank-change graph depends on the number of
links that have changed and the magnitude of changes. Hence in some cases where
a small number of links have changed, the Rank-change graphs may contain only a
37
small number of nodes and links. In other cases with a lot of changes, the Rank-
change graph may contain hundreds of nodes, making it difficult to extract information
visually. Link-Rank allows a user to prune Rank-change graphs using different filtering
techniques to reduce the complexity of the graph. One technique to prune the graph
by using an output filter in the form of a threshold filter to remove edges with weight
change value less than a threshold value set by the user. Other types of filters include
viewing the top N links with highest weight change values, and view links adjacent
on a set of user specified AS. One can also use a combination of all these filters and
specify the order in which filters are applied.
4.2.5 Assembled View: Merging Rank-change graphs from multiple observa-
tion points
Link-Rank views from multiple observation points can be assembled in a single Rank-
change graph. Figure 4.6 shows the assembled view from two observation points AS
11608 and AS 3561. Note, here we have to use the dashed and solid lines to indicate
lost and gained routes. Edges in this example are either blue or pink, blue indicating
the changes from AS 11608, while pink indicating the changes from AS 3561. In
general, in assembled views, each observation point and its changes are represented by
a unique color. With assembled views, one can identify common segments of change
in Rank-change graphs across different observation points and narrow down on the
possible cause of the routing changes.
4.3 Discovery and analysis using Link-Rank
In this section, we use examples to show how Link-Rank can be used to discover and
analyze routing events.
38
Figure 4.6: Assembling views from AS 11608 and AS 3561
4.3.1 Methodology
Our objective is to evaluate how Link-Rank can help network operators discover and
diagnose routing problems. In terms of routing data, network operators have access to
BGP routing tables and update messages received at their routers. We have access to
similar data from the public archives of the RouteViews Oregon collector that contains
routing tables and updates from about 40 routers belonging to different autonomous
systems. In order to understand how network operators diagnose problems, we inter-
acted with network administrators through email and personal interviews at various
North American Network Operator Group’s meetings [1]. Our pool of interviewees
consisted of about 40 operators from both small and big ISPs, most of them having
more than 5 years of experience in network operations. In the rest of this section, we
use the knowledge gained from this interaction to analyze three case studies from the
perspective of an operator using Link-Rank.
We used three ways to select observation points and time periods for case studies.
First, we looked at activity plots from all observation points on a weekly basis and
39
identified the periods with dense activity or spikes. Case I is an example of this, where
we saw some heavy activity from a particular observation point. Second, we looked at
activity plots to find common activity spikes across multiple observation points during
the same time period. Case II is an example where activity plots from multiple observa-
tion points show spikes at around the same time. Cases I and II show that activity plots
can serve as summaries for network operators using Link-Rank. Finally, we picked
case studies in response to reports of routing or traffic problems from external sources
such as North American Network Operators Group (NANOG) mailing lists. Case III is
representative of this category where there were reports of traffic problems from a few
ISPs. In each of these cases, we used the Rank-change graphs during the selected time
periods, and in one case assembled multiple views together, to understand the routing
activity.
4.3.2 Case I: Capturing Link Instabilities
Around March 2005, AS 7018 showed a lot of heavy activity as shown in the second
activity plot (router IP 12.0.1.63) in Figure 4.7 showing activity for a period of one
week. One task of the network operator is to find out whether this activity is because
of a problem within AS 7018 or a problem beyond AS 7018. Another question to be
answered is whether the entire activity is due to the same event or different events. We
drilled down the activity from one week to a one hour period on March 9, 2005 shown
in Figure 4.8. Note from Figure 4.8 showing activity over one hour, that a Rank-change
graph was generated almost every minute.
We then looked at the Rank-change graphs in this period and found a common
sequence of changes. Figure 4.9 shows a typical sequence of Rank-change graphs
we found, with the time window set to 1 minute. This Figure shows that 134 routes
switched between the paths 7018 → 80 and 7018 → 1239 → 80. This behavior was
40
Figure 4.7: Activity plots from March 8, 2005 to March 14, 2005
Figure 4.8: One hour of activity plot from 12.0.1.63 on March 9, 2005
observed for almost three weeks in March 2005. Next step was to find out the preferred
path among the two oscillating paths. From examination of routing tables before the
event, we saw that the preferred path to reach AS 80 was the direct link (7018, 80).
Since the weight of the link (7018, 80) on the preferred path repeatedly touched 0, it
seemed likely that the link between AS 7018 and AS 80 went up and down repeatedly
and was the cause of the instability seen.
Events such as the constant route change above may result in longer delays as well
as possible packet losses. Yet, they often go unnoticed. In this case, the behavior
continued for almost 3 weeks in March 2005 contributing hundreds of thousands of
BGP updates seen at the observation point. A network operator using Link-Rank at AS
7018 would benefit from the quick identification of such oscillations and bring stability
to routes as well as reduce the number of BGP updates in the Internet drastically. In our
41
a. From: Tue Mar 15 00:21:15 GMT 2005To: Tue Mar 15 00:21:45 GMT 2005
b. From: Tue Mar 15 00:22:15 GMT 2005To: Tue Mar 15 00:22:45 GMT 2005
c. From: Tue Mar 15 00:23:15 GMT 2005To: Tue Mar 15 00:23:45 GMT 2005
Figure 4.9: Case I: Continuous switching of routes between two links
examination over other time periods, we found quite a few instances of link instabilities
similar to this case above.
Summary: Densely clustered bars in activity plots, especially where they have near
constant height are almost always a strong indication of link instabilities. Activity
plots are useful in spotting such cases. One can then examine these time periods in
detail to figure out the actual causes of the rapid route changes.
4.3.3 Case II: Root-cause identification
Root cause identification involves inferring the cause of an observed set of routing up-
dates. For Case II, we picked a case where activity plots of many observation points
showed spikes around the same time. Figure 4.10 shows the activity plot of a few
observation points from October 18, 2005 to October 24, 2005. One can easily spot
spikes and dense activity in these plots from multiple observation points( around Oc-
tober 21, 2005). To understand the causes, we looked at the routing activity from AS
6453 (router IP 195.219.96.239) which generated the first activity plot in Figure 4.10.
Starting from an entire day’s activity, we drilled down to a four hour period between
4:00 and 9:00 GMT on October 21, 2005 that contains the dense activity. Figure 4.11
shows this Rank-change graph around 06:20 GMT on October 21, 2005 from AS 6453
42
with a time window set to 15 minutes. During this time, link (6453, 3356) lost close to
3000 routes (out of a total of around 140,000). At the same time, some other links like
(6453, 701) and (6453, 1239) gained routes. Note, for ease of presentation, we do not
show the link weights and prune the graph by applying the filter to remove links with
changes less than 200. Based on observation, the possible cause is either AS 6453, AS
3356, or the link (6453, 3356).
In this case, since similar activity is also seen from other observation points, one
can benefit by combining multiple observation points into a single assembled view.
Figure 4.12 on page 44 shows the assembled view from three observation points, AS
6453, AS 1239, and AS 3257 that showed similarity in activity plots. In the assembled
view, we use dashed lines to represent route loss and solid lines to represent route gain
and assign each observation point and its corresponding changes, a unique color, e.g.
AS 3257 and its corresponding changes are colored blue. The orange colored nodes
indicate other potential observation points, so more views can be added. Here we select
only three observation points to make the Rank-change graph easy to understand. After
we reduce the time window to 5 minutes, one can see from Figure 4.12, multiple links
to and out of AS 3356 were affected, strongly suggesting some problems inside the
AS 3356 and not just the link between AS 6453 and AS 3356. Our observation was
validated by reports from the NANOG discussion forum that AS 3356 indeed had
some internal problems, and was further corroborated by discussions with network
operators.
Summary: To use Link-Rank for identifying root cause, one can look for high
loss or gain links or nodes which have a high number of outgoing edges with weight
changes. One can also assemble multiple views along the lost or gained path to isolate
sections of the path which might be problematic.
43
Figure 4.10: Activity plots from Octo-
ber 18, 2005 to October 24, 2005
Figure 4.11: Case II: Instability observed at AS
6453
Figure 4.12: Case II: Combined view from AS 1239, AS 6453 and AS 3257
44
CHAPTER 5
Inferring Origin of Internet Routing Problems
In Chapter 4 we showed how visualization of link weight changes can be used to un-
derstand large-scale Internet routing events. In this Chapter, we focus on automated
identification of significant routing events and present a heuristic for inferring the ori-
gin of the event. We design a new inference scheme using the abstract measures of
link weight and weight changes. Link-Rank extracts the total number of routes carried
over each inter-AS link in the Internet topology, called link weight, and measures the
changes in the number of routes on each link to capture aggregate routing changes.
This provides a concise representation of the view from a particular BGP router. We
further leverage our previous observations that, among multiple alternative paths to
a given destination, the most preferred path is used most of the time [32]. As a di-
rect corollary from this observation, each AS link is expected to have a stable weight,
and deviations from this expected value can serve as indications of significant routing
changes.
Once a significant deviation is detected, our objective is to identify the origin of
this deviation. Given the view from a single router, we can use a min-cut heuristic to
identify the most likely faulty AS node or AS-AS link, enabling an isolated BGP router
with only its own update stream to identify significant events and infer the origin of
these events. By correlating changes observed from different monitors, we can achieve
a very high degree of accuracy identifying the AS node or AS-AS link responsible for
triggering the event. This can all be done in near real-time and provides useful tool for
45
network operations and for understanding BGP protocol behavior in the aggregate. We
validated our heuristic by accurately identifying session problems reported by Abilene
with its peers. Our evaluation on events where problem area is adjacent to the origin AS
shows that we could achieve an accuracy of close to 95%. On applying our heuristic
over one month of BGP data, we found various interesting routing instabilities, some
recurring again and again, clearly highlighting the need to be able to identify origin of
changes on a regular basis.
5.1 Characterizing links and identifying routing events
In this section, we look for a way to characterize link weights and identify routing
events. We are particularly interested in identifying the steady value of a link in the
absence of major routing events. If we obtain such a steady value, then we can identify
irregular behavior of links whenever this value changes significantly. To understand
link weight changes, we sampled link weights from a particular observation point every
10 minutes for the entire month of January 2007 resulting in close to 4500 samples .
Out of all the links seen from the observation point, we filtered out links with weight
less than 25, leaving us with 2897 links. Figure 5.1 shows frequency distribution of
link weight values for 3 links randomly picked from different link weight buckets (low
weight, medium weight and heavy weight). Note that the origin in Figure 5.1 has
been shifted and we can see that a majority of the samples fall under a small range
of values. In order to understand the overall trend, we calculated the percentage of
samples falling within 2 standard deviations of the mean link weight for each link.
Figure 5.2 shows the distribution of this percentage of samples for all the links. This
figure shows that over 50% of the links have all samples within 2 standard deviations
of the man link weight. Figure 5.2 also shows that the lowest percentage of samples
within 2 standard deviations is about 80. From this we can see that the link weights of
46
links are more or less steady.
To look for this steady value, we sample the link weight regularly and define an
exponential moving average of link weight called Expected Weight. The expected
weight w̄(l) takes into account the past history of the link weight as well as present
and is computed as
w̄(l) = (1− α) · w̄(l) + α · w(l)
where w(l) is the current weight of the link l, and α decides how much importance is
assigned to the current value compared to the past values.
While some links like that connecting a stub AS to its provider will mostly have
a very stable weight, other links such as ones connecting transit providers vary. Fig-
ure 5.3(a) shows the instantaneous weight of link 7018-1239 as seen from a BGP router
in AS 7018. Note that the Y-range in this graph starts from 23,600. We can see a few
spikes in the instantaneous weight as well as some longer lasting changes (plateaus).
Our approach is to characterize how much link weights fluctuate and look for devia-
tions that exceed some normal value. We capture this deviation using mean deviation
δ(l) = |w̄(l) − w(l)|, where w̄(l) is the expected weight of link l, and w(l) is the
current observed weight of the link. To take into account the history of deviation, we
define an exponential moving average of the mean deviation as
δ(l) = (1− β) · δ(l) + β · |w̄(l)− w(l)|
Our objective in assigning values to the sampling interval T , α and β is to get
an expected behavior that is not influenced by very short lived changes (especially
convergence events), while at the same time be able to adapt to longer lasting changes
quickly enough. We want the value of T to be greater than the convergence time
that has been found to be 2 minutes on average [32] and sometimes be as long as 5
minutes. Based on this, we set the sampling interval T = 10 minutes. To estimate
47
0
500
1000
1500
2000
2500
3000
3500
4000
36 38 40 42 44 46 48
Freq
uenc
y
Link weight
Link 3356-1290
0
50
100
150
200
250
300
2000 2200 2400 2600 2800
Freq
uenc
y
Link weight
Link 2914-4323
0
10
20
30
40
50
60
10900 11000 11100 11200 11300 11400 11500 11600
Freq
uenc
y
Link weight
Link 2914-209
Figure 5.1: Frequency distribution of link weight values of 3 links from a single ob-
servation point
48
0
20
40
60
80
100
0 500 1000 1500 2000 2500 3000
Per
cent
with
in 2
σ
Link ID
Figure 5.2: Percentage samples covered by the top 5 most common values see in the
4500 link weight samples for each link
a good value of α for our choice of T = 10 minutes, we picked a few random links
from the set of heavy weight links, medium weight links and low weight links. We
plotted the instantaneous samples for all these links over a 10 day period. We then
plotted the expected weight w̄(l) for different values of α ranging from 0.1 to 0.9.
From our observations, α = 0.1 provided a close approximation of the link weights
while not being affected by the short lived spikes. Figure 5.3(b) shows the expected
weight for (7018,1239) with α = 0.1, and we can see that the curve closely follows
the instantaneous values except for the short lived spikes. Following similar studies
for deviation, we picked β = 0.25.
5.1.1 Link Events
With an expectation of the weight of a link, we now define a link event as a significant
weight change of a link. The mean deviation δ(l) provides an estimate of how much a
link weight fluctuates. We define a change as significant when the weight of changes
by more than the mean deviation. In other words, we associate the start of the event
49
23600
23800
24000
24200
24400
24600
24800
25000
0 50 100 150 200 250 300 350 400 450
Inst
anan
eous
Wei
ght
Sample ID
Link 7018-1239
(a) Instantaneous Link Weight
23600
23800
24000
24200
24400
24600
24800
25000
0 50 100 150 200 250 300 350 400 450
Exp
ecte
d W
eigh
t
Sample ID
alpha=0.1
(b) Expected Weight
Figure 5.3: Example showing choice of α = 0.1
50
as the time when the current weight changes by more 2 ∗ δ(l). At this point, we need
to identify the end of the event when the link stabilizes again. We mark the end of the
event by using a fixed timeout of t = 3 minutes, being slightly more conservative than
the average convergence time for events of around 2 minutes [32]. When viewing all
links together, an event starts when any link changes by more than the deviation range,
and ends t = 3 minutes later.
Note it is possible that links with very small weights (e.g. w̄(l) = 5) can result in
a link event when even 1 route changes. To account for this, we log all link weight
changes and filter out very small weight changes (.e.g < 25 routes). Also due to
our choice of fixed timeout of t = 3 minutes, it is possible for a single big event
(e.g. 50,000 route changes) to be broken into more than one event. In Section 5.4
we show that it is still possible to accurately diagnose cases where a single routing
event is broken into multiple events. Figure 5.4 shows the number of events identified
by different observation point from RouteViews’ Oregon collector for the month of
January 2007.
As our later evaluation has shown, this technique has proven practically effective
in identifying routing events at the aggregate level. We discuss some potential problem
areas of using link weight based events in Section 5.4. Having identified an event, we
now turn to the main challenge of inferring the origin of this event.
5.2 The Inference Scheme
We now present our heuristic to identify the origin of routing events. We first present
an overview of our approach and then go into the details.
51
0
1000
2000
3000
4000
5000
6000
7000
0 5 10 15 20 25 30 35 40 45
Num
ber o
f eve
nts
(Jan
200
7)
Observation Point ID
Figure 5.4: Events in January 2007
5.2.1 Overview of approach
After we apply the event identification scheme described in Section 5.1, we get a set
of routing events. Each event contains the time of the event and the set of links along
with the weight of the link at the end of the event, and the change in weight. An event
may be a failure event where a bunch of preferred routes are lost, or may be a recovery
where previously lost preferred routes are available again. If the event is a failure then
the origin of change is contained in the set of links that have lost routes, while if its
a recovery event, then the origin of change is contained in the set of links that gained
routes. At the first stage we do not make any classification of the event as a failure or
recovery and identify origin of change for both possibilities.
For the set of edges that lost routes, we correlate the changes across different links
involved and construct a flow graph called fault graph with an artificial source S and
an artificial sink T . The idea behind constructing a fault graph is that any cut (set of
edges) disconnecting S and T represents a possible explanation for origin of change.
Further, if each edge were equally likely to fail, a cut involving the least number of
edges, or min-cut is most likely to be the origin of the change. By augmenting the
52
weight tracking & event
identificationBGP
updates
constructfault graph recovery
Candidate set
reduction
+ve changes origin of
change
constructfault graph
failure
-ve changes
min-cut edges
min-cut
edges
Figure 5.5: Main steps in inference
fault graph with information from other available observation points, we argue that the
min-cut on the fault graph is the most likely explanation. We repeat the procedure of
constructing a fault graph and finding the min-cut for the set of edges that gained routes
as well. Finally, given two possible explanations, one with edges that lost routes, and
one with edges that gained routes, we use information about expected weights and
variance to understand which explanation is more likely. Figure 5.5 shows the main
steps in our process.
5.2.2 Fault graph
We now go into details of the construction of fault graph. For each event, we construct
two graphs, one is the loss graph involving all links that lost weight, and the other is the
gain graph involving all links that gained weight. At this stage we do not know whether
the event is a failure or a recovery and there is greater benefit in adding information
from other monitoring points first. Algorithm 4 details the construction of a fault graph
for positive and negative weight edges for each event.
The main idea in constructing the fault graph is to connect the source node S to
all the nodes that have only outgoing edges, and the sink node T to all the nodes that
have only incoming changes. Figure 5.6 shows a fault graph constructed from a single
observation point for an event that occured on March 9, 2007 at 18:05. One can see
53
Algorithm 4: Fault-Graph(E)Construct Gst as union of all negative (or positive) edges in E;
Assign w(e) = 1 for all e ∈ E;
/* Find nodes with no incoming changes */
for each node n such that (x, n) /∈ E for any x do
add edge (s, n) to Gst with w(s, n) =∞ ;
/* Find nodes with no outgoing changes */
for each node n such that (n, x) /∈ E for any x do
add edge (n, t) to Gst with w(n, t) =∞ ;
here that node 11537 is not connected to either S or T , since it has both incoming and
outgoing edges. In reality, our implementation takes into account the total incoming
weight change and the total outgoing weight change in making a decision on whether
a node connects to S or T . For example, if a node B has total incoming change of
-500 i.e. w(A,B) = −500 but an outgoing change of only -10, i.e. w(B,C) = −10,
then the sink of the flow has to be at node B, due to the discrepancy in the incoming
and outgoing changes. The enclosed table shows the link weights and changes for the
three links included in the fault graph. In the fault graph, any of the three edges could
be cut to obtain a possible explanation, but generally, as you go farther away from the
source S, the edge weight decreases, and with comparable change on a single path, we
try to remove an edge as far away from S as possible. In Figure 5.6, a min-cut on each
graph results in two edges 11537-2018 and 5713-2018 as possible candidates
5.2.3 Augmenting fault graph with views from additional observation points
One may have access to events generated from other observation points as well in
addition to their own BGP data. In practice, one can achieve this by processing events
54
S
11686 19782
2018
11537
T
Link Weight Expected Change Dev
11686-19782 12387 12342 -135 2119782-11537 8495 8658 -135 23 11537-2018 0 137 -137 0
S
11686 19151
2018
5713
T
a. Negative weight change
b. Positive weight change
Link Weight Expected Change Dev
11686-19151 16484 18346 135 335619151-5713 243 120 129 0 5713-2018 136 0 136 0
Figure 5.6: A fault graph from single observation point
S
2914
3549
3216
8732
T
Cut
Link Weight Change
2914-3549 5870 -1723549-3216 0 -613549-8732 0 -85
(a) Min-cut on fault graph from 2914 identifies
incorrect edge
S
2914
3549
3216
8732
T5511
4637Cut
(b) Adding information from other observation
points to fault graph increases accuracy
Figure 5.7: Augmenting a fault graph with additional information
55
from a few peers of public data collectors like RouteViews and RIPE RIS. We are
interested in adding information from other observation points to aid the inference
from our primary observation point. Specifically, we build on the fault graph from the
primary observation point in Algorithm 4.
We first identifying links from other observation points that are common to the
fault graph from primary point. We then identify the links that connect to and from
these common links and add these set of links to the fault graph. Finally, we add
connect nodes that have only outgoing edges to S and nodes that have only incoming
edges to T . Algorithm 5 presents the details.
Algorithm 5: Augmenting the fault graphInput: Gst: The fault graph from primary observation point for time tOutput: G′st: Augmented fault graph for time tfor each monitor Mi do
for event in time interval (t− δ, t+ δ) do
for edge (a, b) common to Gst and Ei do
Ei=edges connecting to and from (a, b);
addToGraph(Gst, Ei);
Figure 5.7(a) shows the fault graph for an event observed from AS 2914. Here, the
min-cut results in edge 2914-3549 as the possible origin of change. Now, adding infor-
mation from 2 other monitoring points as in Algorithm 5 results in the fault graph in
Figure 5.7(b), and one can see that the edges 3549-3216 and 3549-8732 are now iden-
tified as the origin of the change. Adding more observation points, further strengthens
this explanation. Noticing the weights of the edges involved, it seems most likely that
this is the correct explanation.
56
5.2.4 Candidate set reduction
In this final stage, we infer whether the change was a failure or a recovery based on
the candidate sets we have and end up with either the positive change candidate set or
the negative change candidate set as the most likely root cause.
5.2.4.1 Using link state information
We define the following states a link (l) can be based on its current weight w(l), and
expected weight w̄(l) and deviation δ(l).
1. Normal: when w(l) is within w̄(l)± 2 ∗ δ(l)
2. Loss: when w(l) < w̄(l)− 2 ∗ δ(l)
3. Gain: when w(l) > w̄(l) + 2 ∗ δ(l)
The state transitions between these states is shown in Figure 5.8. When a link fails,
its weight drops and as a result it transitions from normal state to loss state, defined as
fail. At the same time, the affected routes would take an alternate path and the weight
for such a link on the alternate path will increase, thus moving it from normal state to
gain state. We define this transition of state from normal to gain as a fail-other. This
increase in rank however will not be due to its own recovery, but due to some other
failure that results in routes using the link as an alternative. Similarly, transition from
loss to normal and gain to normal can be classified as recover and recover-other.
Figure 5.8 also shows transitions between states of loss and gain. Such transitions
(called loss-gain and gain-loss) are not frequent, but are difficult to classify and hence
we do not associate them with recovery or failure. While we find a vast majority of
the transitions to occur between normal and loss, and normal and gain states, a more
detailed study needs to be done in understanding conflicting cases in order to make
57
Normal Loss
Gain
fail
recover
fail-other
recover-othergain-loss
loss-gain
Figure 5.8: State transition diagram
best use of the expected link weights. From the example in Figure 5.6, using the
states defined in Figure 5.8, we classify the edge 11537-2018 as fail, and the edge
5713-2018 as fail other. Thus, the origin lies on the link 11537-2018 and actually
the correct explanation as verified by logs obtained from AS 11537 indicating a BGP
session failure with AS 2018 at that exact time.
5.2.5 Identifying node problems
Even though our scheme uses link weights to capture changes, we can also identify
potential node problems where the origin is an AS instead of particular links. This can
happen due to i-BGP problems within an AS causing instability or due to a BGP router
connected to multiple ASes going down. To identify potential node problems, we first
rank each node by the number of edges adjacent to it in the set of links from candidate
set. The nodes with high rank (e.g. lots of adjacent edges in candidate set) are likely
candidates for node problems. In our analysis, we found a few cases where a lot of
links adjacent to a particular node were identified as origins of change.
58
5.3 Evaluation
In this section we discuss how we validate the results from our scheme. A fundamental
challenge in the validation of a scheme to infer origin of changes is the dearth of
publicly available documentation of BGP session or router failures. We perform two
kinds of validation, one using publicly available information of outages from Abilene,
and the other using BGP updates to identify a class of events where the origin of the
event is almost surely adjacent to the AS announcing the prefix.
5.3.1 Validation using Abilene Data
The Abilene network in United States is a high-performance backbone network created
by the Internet2 community. Abilene maintains a publicly accessible mailing list and
archive containing descriptions of problems inside Abilene, outages, as well as BGP
session problems with its peers. A typical peer unavailable message describes which
BGP router inside Abilene lost connectivity to which AS, and the start and end time of
the outage. However, most of Abilene’s peering with an AS occurs through multiple
physical locations, e.g. Abilene peers with KreoNet through Chicago as well as Seat-
tle. Hence, when one of the BGP sessions, (say the Chicago peering with KreoNet)
goes down, the affected AS (KreoNet) can still be reached using other internal BGP
routers (KreoNet through Seattle). In such cases, Abilene may not announce an AS
path change to its neighbors, and hence such events cannot be used for our validation.
Using peering maps available from Abilene, we identified BGP peers of Abilene that
connected at exactly one location and carried more than 25 prefixes so as to generate
a link weight disturbance. If such a peering session were to go down, then Abilene
would send an AS path change to its neighbors and this event could be seen at other
observation points. We extracted events related to these peers over a 3 month period
59
S
7660 22238
201811537 T
11686
Cut
19782
Figure 5.9: BGP peer TENET (AS 2018) of Abilene (AS 11537) was unreachable.
Event observed from primary view of AS 11686.
from January 1, 2007 to March 31, 2007. We found a total of 7 BGP peering ses-
sion failures in this period from Abilene’s email archives. Out of these events, 6 of
them caused link weight disturbances observed at one or more observation points in
our data, and all 6 of them were identified by our heuristic. Figure 5.6 on page 55
presents the fault graph from one of these 6 cases that was used for validation. Adding
one additional view from AS 7660 gives the fault graph with weight losses shown in
Figure 5.9. Our heuristic puts the min-cut at edge {11537,2018}. From the mailing list
message, Abilene’s peer TENET (AS 2018) lost connectivity to the Abilene network
(AS 11537) at that time, as accurately identified by our scheme.
5.3.2 Validation using Origin-adjacent events
Prior work in inference of origin of change [10] validate results using BGP beacons.
However, we do not use BGP beacons for validation since beacon events are per prefix
and hence are not suited for our link weight based scheme which is tailored for larger
scale disturbances. Instead, we identify large scale events where a set of prefixes orig-
inated by the same AS were unreachable as observed from a majority of observation
points. Given the topological mesh-ness, the most likely reason for unreachability
from diverse set of observation points is that the problem lies adjacent with the origin
AS announcing the prefixes, i.e. the origin itself, its peering with its provider, or its
60
provider. We call such events origin-adjacent events.
5.3.2.1 Collecting origin-adjacent events
To collect a set of origin-adjacent events, we use the clustering based event classifica-
tion scheme [32] based on initial and final paths for each prefix. In particular, we are
interested in the events classified by [32] as Tdown events where a prefix is unreachable
from multiple observation points. We extended this scheme and further correlated the
Tdown events across different prefixes using a fixed sliding window of 30 minutes. All
events happening in the same time window affecting prefixes originated by the same
origin AS are aggregated into a single event. We further removed those events that
were affecting less than 50 prefixes, and from the remaining events, we only consid-
ered those that were observed in more than 50% of the monitors. We applied this
classification scheme on the BGP data collected from RouteViews Oregon collector
over the month of January 2007 to give us the set of origin-adjacent events. Note,
that different observation points might observe slightly different events but any event
must have been observed by at least 50% of the observation points. Figure 5.10 shows
how the number of events involving different minimum number of prefixes for each
observation point 1.
5.3.2.2 Applying link weight based min-cut heuristic
To apply our scheme, we randomly selected 11 out of the 45 observation points we
had data from and identified events based on our link weight based event identification
scheme described in Section 5.1. We then constructed a fault graph for each obser-
vation point individually using Algorithm 4. The min-cut on this fault graph, called
Fsingle, indicated the likely origin of change taking into account only the primary ob-
1We removed two observation points that saw a very small number of events.
61
0
50
100
150
200
250
300
0 5 10 15 20 25 30 35 40 45
Num
ber o
f eve
nts
Observation Point ID
Affecting >50 prefixesAffecting >75 prefixes
Affecting >100 prefixes
Figure 5.10: Number of origin-adjacent events affecting each observation point
servation point. We then augmented the fault graph with the view from the other
observation points as in Algorithm 5. The min-cut on this fault graph, called Fmult, in-
dicated the likely origin of change by augmenting information from other observation
points to the primary observation point.
5.3.2.3 Results
Figure 5.11 shows the percentage of prefix origin-events-50 accurately identified by
each observation point using its primary view only, i.e. Fsingle, as well as with infor-
mation from other views, i.e. Fmult. The accuracy of event detection using just the
primary view varies over the different observation points. However, we can see that
the accuracy for all observation points is consistently above 90% for Fmult shown in
Figure 5.11. Similarly high accuracy was also obtained for prefix events involving
more than 75 and 100 prefixes respectively. This shows that we can accurately identify
the origin of the event and that adding information from other observation points does
help in increasing the accuracy of identifying the cause of the change.
62
0
20
40
60
80
100
120
0 2 4 6 8 10 12
Acc
urac
y P
erce
ntag
e
Observation Point ID
Single ViewMultiple Views
Figure 5.11: Accuracy of Fsingle and Fmult for origin events involving more than 50
prefixes
5.3.3 Application to BGP data
We now present the results of application of our scheme to the BGP data over the
month of January 2007. We use the same set of 11 observation points used in Sec-
tion 5.3.2. Each observation point uses the additional views from the other 10 points
in diagnosing its own events. Figure 5.12 shows the number of origins of change (link
instances) over the one month period. Note, a single link may be involved more than
once as an origin of change. Next we examine how each link is classified as per the
state transitions in Figure 5.8. We see that for each observation point, in close to 50%
of the cases, a link instance is classified as a failure, recovery, or backup, i.e. failure-
other or recovery-other. We now investigate the events seen by a BGP router from AS
2914 in more detail.
Figure 5.13 shows the cumulative distribution of number of instances as origin of
event per link. We can see that close to 25% of the links contribute to over 75% of
the origin of change instances. Next we examine the top 10 links in terms of instances
per link and find that most of the top links are links adjacent to AS 2914 or involving
63
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0 2 4 6 8 10 12
Link
cou
nt
Observation Point ID
Total Failure/Recovery/Backup
Not classified
Figure 5.12: Links involved in events per observation point
big ISPs. However, the top two most frequently appearing links were 2200-2072 and
13049-2072. These links are two hops away from AS 2914, yet are responsible for
over 400 origin change events each. Figure 5.14 shows the Link-Rank representation
for one such event involving these two links. In this particular case, as viewed from
AS 2914, routes were lost along 2200-2072 and gained along 13049-2072. Over the
one month period, many such events were observed where routes repeatedly switched
between the two paths shown. Clearly, cases like these can be avoided if detected
immediately, instead of recurring over extended periods of time. Next we present
another interesting large scale event from our analysis.
5.3.3.1 Case Study
The case study we present involved lots of routing changes from AS 2914 starting on
Jan 31, 2007 around 7:00 GMT and lasted for about an hour. During this period, our
heuristic identified a lot of links adjacent to AS 2914 as origins of observed routing
change. To understand the event better, we present the Link-Rank graph summarizing
the major links involved in Figure 5.15. As an example, the link between AS 2914
64
0
2000
4000
6000
8000
10000
12000
14000
16000
0 500 1000 1500 2000
Inst
ance
s pe
r lin
k (C
DF)
Link ID
Figure 5.13: Cumulative Distribution of instances per link from AS 2914
Figure 5.14: Repeated instability involving AS 2072 as viewed from AS 2914.
65
Figure 5.15: Case study: Routing changes seen from AS 2914
and AS 3257 (towards the top right) lost 2790 routes and had a final weight of 0. One
can see from the Figure 5.15, that a lot of peers of AS 2914 lost all their routes. After
discussing this event with a network operator at AS 2914, we found out that during
that period, AS 2914 was providing temporary restoration for regional ISPs affected
by the underwater cable cut due to the earthquakes off the coast of Taiwan. One of the
downstream customers of AS 2914 in turn was providing temporary transit to a very
large asian network and hence announced lots of prefixes to AS 2914. As a result, the
max prefix filters of peers of AS 2914 were tripped causing the BGP peering sessions to
go down. These session failures were accurately identified as origins by our heuristic.
5.4 Discussion
In this section we discuss limitations of our inference technique and identify open
issues that deserve further attention. Contrasting with the previous approach to origin
inference by identifying the shared link or node among a large number of routes that
failed at the same time, our link weight based approach does not require per-route
66
information. Link weight changes reflect an aggregate measure of route changes, they
not only bring an essential saving in the data processing but also help improve the
inference accuracy in case of limited data, for example when only data from a single
router is available. 2
As the saying goes, every coin has two sides. While the use of link weights and
weight changes makes the origin inference much faster, it can also introduce potential
inference errors due to the lack of information about individual routes. one possibility
is that some unrelated routing events may result in a link receiving both positive and
negative weight changes simultaneously, and the combined results might lead to a neg-
ligible magnitude of changes, or even mask off the weight changes entirely. Another
possibility concerns correlating observed link weight changes from different vantage
points, there exists a non-zero probability that these changes could be due to simul-
taneous but different routing events, and again the lack of information about specific
routes makes our scheme unable to tell whether that is, or is not, the case.
Nevertheless, our BGP routing measurement efforts over the last several years give
us high confidence that the operational Internet exhibits strong system characteristics,
which we can leverage to identify whether simultaneous routing events occurred. The
expected weight and variance of the AS links represent such a system characteristic
which we explored in our design. Given most routes are longer than one AS hop, the
correlation of the weights and weight changes of adjacent links is another characteristic
that we are yet to gain a full understanding, especially in cases where the affected links
by different routing events partially overlap. As part of our future work, we expect to
gain and utilize this understand to detect simultaneous routing events.
2Unfortunately, due to the unavailability of the data set used in [10], we are unable to quantitativelycompare the processing saving and accuracy improvement of our scheme with the results presentedin [10].
67
CHAPTER 6
Detecting and Alerting about Prefix Hijacks
So far we have discussed techniques to diagnose routing problems in the Internet using
path information from the Border Gateway Protocol (BGP). However, if BGP provides
incorrect routing information, packets may never reach the intended destination, and
may even be misdirected to malicious destinations. The inability to ensure the in-
tegrity and correctness of routing information leads to many known vulnerabilities in
BGP [33]. In this chapter we focus on a type of routing vulnerability called prefix
hijacking that can cause severe security and privacy breaches. In a common prefix
hijacking event, an Autonomous System (AS) originates a route for an address space,
termed as prefix, but does not provide data delivery for that prefix. In other words, an
AS reports “use me to reach prefix p”, but does not actually provide data delivery for
prefix p. For example, on December 24, 2004, AS 9121 incorrectly originated routes
to 106,089 prefixes, almost 70% of all the prefixes at that time. BGP routers through-
out the Internet selected the route originating from AS 9121 as the best path to some
or all of these prefixes. Traffic for these prefixes was then forwarded to AS 9121, who
then essentially dropped the packets, affecting thousands of organizations [34]. When
a prefix is hijacked, sensitive data from unsuspecting users could easily fall into the
wrong hands, resulting in serious security and privacy breaches. A recent study has
also found that spammers hijack BGP prefixes to send spam mail [35]. Thus, prefix hi-
jacking is real operational concern in the Internet, and securing Internet routing against
prefix hijacking is an important problem.
68
In this work, we design a system called Prefix Hijack Alert System (PHAS) that
builds on the premise that the prefix owner is the best person to accurately distinguish
between legitimate changes and prefix hijacking events. PHAS provides prefix owners
with timely and reliable notifications of potential prefix hijacks. During a prefix hijack,
the notification itself may reach the hijacker instead of the prefix owner, and thus the
prefix owner would not be informed of the ongoing hijack. To increase the chances of
notification delivery, we use a multi-path delivery mechanism using the existing email
infrastructure to increase the chances of notification delivery. Our design is readily
deployable and easy to use. Once our system has detected the problem, the owner
can then take necessary actions, including soliciting help through operator channels
like North American Network Operators Group (NANOG) mailing lists, and the NSP-
Security mailing lists to either resolve the problem with the hijacker or its upstream
ISPs.
6.1 Prefix Hijack
In a prefix hijack event, the announced path to the prefix cannot actually be used to
deliver data to the prefix. In some parts of the Internet, the false path replaces the
authentic route to the prefix and traffic that follows the false path will eventually be
dropped or delivered to someone who is pretending to be the legitimate destination.
In other words, the traffic sent along the false path has been hijacked. We term the
AS injecting false information as an attacker AS, and the AS that owns the route as a
victim AS. For example, in Figure 6.1b, AS 110 announces 131.179.0.0/16, while the
true origin for this prefix should be AS 52. It can be seen in this example, that AS 110
successfully effected a hijack, since AS A decided to pick the route to AS 110 instead
of AS 52. In this case, AS 52 is the victim, AS 110 is the attacker and any traffic sent
by AS A is delivered to AS 110 rather than the legitimate origin. Note that AS 52 may
69
see a drop in its overall traffic volume, but variations in traffic load are the norm for
most networks and AS 52 may be completely unaware that hijack event is occurring.
An attacker AS can hijack a prefix in various ways such as falsely announcing
itself as the origin for a prefix (as discussed in the example above), falsely modifying
some portion of the path other than origin, or falsely announcing a more specific prefix.
Our presentation of PHAS is particularly concerned with the first case case where the
origin AS is not valid. Section 6.7 discusses how the PHAS concept can be extended
to handle path modifications adjacent to the origin AS and announcement of more
specific prefixes.
Simply detecting the occurrence of a prefix hijack event is an essential, but difficult
task. Large-scale events where an AS mistakenly hijacks thousands of prefixes may
be detected relatively quickly due to their size and impact. For example, in the AS
9121 event described above, thousands of prefixes from different origins, suddenly
changed to origin AS 9121, a clear indication of prefix hijack. But smaller scale errors
and intentional attacks can be much more difficult to detect. For example, suppose a
malicious AS originates a false path to only one prefix, 131.179.0.0/16 (UCLA). Some
BGP routers will accept the new false path while others may continue to use the correct
path originated by UCLA. An origin change for a single prefix is a common occurrence
and is unlikely to trigger alarm. As we will show later in the chapter, there are quite
a few origin AS changes during a typical day and most of these changes are valid. A
prefix may change its origin AS at any time due to contractual arrangement, multi-
homing, traffic engineering, and a host of other factors. Only the origin itself (UCLA
in our example) could easily and accurately distinguish between a legitimate origin
change and a prefix hijack [18]. The legitimate origin is best able to identify this type
of prefix hijack, but it has very little information about the BGP routes taken by others
to its own prefix. In this case, UCLA may notice a drop in traffic and/or reports of
70
R
52
Q
P
AB
R
52
Q
P
AB
110
Prefix 131.179.0.0/16
Prefix 131.179.0.0/16Prefix
131.179.0.0/16
hijack
attackerAS
a. True origin AS 52 announces prefix131.179.0.0/16
b. False origin AS 110 announces prefix131.179.0.0/16 and hijacks A's route
Figure 6.1: Example of prefix hijack
connectivity problems, but there are numerous potential causes for this. Even if UCLA
suspected a prefix hijacking attack, UCLA’s local data can only confirm that it has
correctly announced its own route. To determine if others are incorrectly announcing
a route to UCLA, the UCLA administrators would need data from other remote sites.
6.2 System Design
One of the goals behind the existing BGP monitoring projects such as Oregon Route-
Views and RIPE RRC is to provide network operators with a remote view of their own
71
prefixes. Through establishing BGP peering with operational routers, the RouteViews
and RIPE RRC projects collect routing data from a few hundred BGP routers around
the globe placed in critical exchange points, tier-1 ISPs, and so forth.1 These BGP data
collectors obtain information in real time, which can be used to quickly detect prefix
hijacking and identify the source of the problem. For example, a prefix hijack event
occurred on January 22, 2006 and affected close to 80 prefixes including a financial
organization. Within a few seconds of the event occurrence, RouteViews data collec-
tor received update messages from several of its BGP peers indicating a new origin
to the prefix of the financial organization. If the prefix owner had received this data,
it could have immediately detected the prefix hijacking and could have quickly taken
corrective measures using operator channels. However, prefix owners do not have any
way to easily access the data. The current BGP monitors collect vast amounts of data
and dump the raw data, unsorted, onto the disk. It is impractical to assume that all the
prefix owners would be able to download this the data and then extract the information
about their own prefixes, let alone doing so in real-time.
Our basic approach is to examine BGP routing data collected at RouteViews (or
RIPE, or any other BGP collectors), and provide real time notifications of any poten-
tial prefix hijacking to the prefix owner in a reliable way. In particular, we should
immediately notify the prefix owner anytime a new origin AS is associated with their
prefix. At a potentially slower rate, the prefix owner should be notified when an origin
AS is no longer used to announce its prefix. The net result is that the prefix owner is
able to track the set of AS numbers that originate its prefix. Presumably, the prefix
owner also knows which AS numbers are allowed to serve as origins and can thus de-
tect any false origins, as well as know when the false origins have stopped announcing
1Admittedly a few hundred routers represent only a small fraction of the overall Internet. A prefixhijack that affects only a small local region may not be observed by any of the current BGP monitors. Ina separate project, we are studying the optimal BGP monitor placement problem, however those resultsare beyond the scope of this chapter.
72
OriginMonitor
NotificationTransmission
LocalNotification
Filter
User RegistrationRouteViews
RIPE RIS
BGPUpdates
originevents notifications
alarms
Prefix Owner
Internet
User Side
Figure 6.2: Components of PHAS
its prefix.
More formally, we define an origin set for each prefix and track changes of this
origin set. Existing BGP monitoring projects such as RouteViews and RIPE, peer with
a few hundred BGP routers around the globe and collect BGP updates in real-time.
Each monitored BGP router, or monitor in short, reports its best path to a prefix P and
the last hop in this path is an origin AS for P. We define the origin set OSET (P, t) for a
prefix P as the union of all the origins seen by all the monitors for that prefix at time t.
For example, on January 22, 2006 before 8:31 hours GMT, all RouteViews monitors
reported paths ending in AS 19758 for prefix 65.173.134.0/24, and thus for time t <
8:31 on 01/22/2006, OSET (65.173.134.0/24, t) = {19758}. When the prefix was
hijacked at 8:31AM, some monitors switched to paths ending with AS 27506 and thus
for the time t = 8:31 on 1/22/2006,OSET (65.173.134.0/24, t) = {19758, 27506}.Our
objective is to immediately notify the owner of 65.173.134.0/24 of this new origin set,
and the owner could then work to resolve this issue with the offending AS 27506 or its
upstream providers. Later, when the origin AS 27506 would not be seen as announcing
this prefix anymore, we would like to send a notification to the prefix owner indicating
that the origin set OSET (65.173.134.0/24, t) = {19758}, so that the prefix owner also
73
knows that the problem has been resolved.
Our design consists of the following four components.
1. User Registration: All prefix owners who are interested in using our system need
to register with the PHAS server and provide contact email addresses. PHAS
aims to provide a web-based registration service, similar to the standard mail-
ing list registration process. Each new user opens an account by his/her email
address and a password via a secure HTTPS session. This action is sent to the
email address for confirmation. Once confirmed, the registration is committed,
and any later change to the account is done via HTTPS and requires the pass-
word. The registration specifies which prefixes are of interest and each registrant
is strongly encouraged to submit multiple email addresses hosted by different
email systems (such as a GMail address), to maximize the chance of email re-
ception in face of prefix hijacking. Ideally, only the legitimate owner of a prefix
should register, but verifying the correct contact address for each prefix is a chal-
lenging problem in its own right with no immediately deployable solution. In
PHAS, an attacker may register and falsely claim to be the prefix owner. How-
ever, this action does not cancel the registration by the legitimate owner and all
notifications are based on publicly available data so the attacker gains no new
information by successfully registering.
2. Origin Set Monitoring: Using the BGP monitor data, PHAS maintains a current
origin set for each registered prefix. If there is a change to this origin set, an ori-
gin event is generated.To control the number of origin events for prefixes with
frequent origin changes, we use a time-window based mechanism to reduce the
repeated reporting of the origin changes but still guarantee the immediate no-
tification for any new origin announced for a prefix. We increase the duration
of this window for prefixes that report lot of origin changes even after the de-
74
fault time window is used. The window duration is decreased if the number of
origin event reduces. This adaptive window scheme is central to ensuring the
system scales from the perspective of the origin set monitoring and also limits
the number of false positives sent to the prefix owners. It is discussed further in
Section 6.3.
3. Notification Transmission: Once an origin event is generated, PHAS decides
whether the origin event translates to a notification message to be sent. For this,
it checks the user registration to see if there are email addresses registered for
the prefix involved. However, the seemingly simple task of sending a notification
message, could be difficult in face of prefix hijack. For example, when the route
to UCLA has been hijacked, email from PHAS to [email protected] may follow
the hijacked route and never reach the intended receiver. To protect against this
case, we strongly recommended two practices for prefix owners in order to set up
“multiple diverse paths” for email delivery. First, in registering with our system,
prefix owners should provide multiple email accounts on different email servers
that are topologically diverse. Second, prefix owners should have Internet access
via multiple prefixes. ISPs often have multiple prefixes of their own. For one that
only owns a single prefix, a backup plan like a dial-up Internet access account is
recommended. With the combination of multiple email addresses and multiple
prefixes for Internet access, prefix owners can achieve a high success rate of
notification delivery even in face of prefix hijack. All notification messages are
also signed by PHAS server, whose public key is well-known. More details on
the notification scheme will be discussed in Section 6.4.
4. Local Notification Filter: Although the notifications could be sent directly to
network administrators, our design assumes an automated processing of the re-
ceived notifications. Tasks such as verifying the message is properly signed,
75
checking whether periodic notifications has been received, and so forth are bet-
ter handled by an automated receiver. In addition, many prefixes have multiple
legitimate origins and thus not every change in the origin set is necessarily an
attack that should be reported to the local network administrators. To make the
system more user friendly, we provide a local filter program for processing the
notification email. The local filter manages the external email addresses, checks
any change in origin against a locally configured set of valid origins, and only re-
ports an alarm to administrator when an unexpected origin change occurs. Local
administrator can easily customize the filter program or even provide their own
filter program. By incorporating a local filter, all the legitimate origin changes
are simply screened out by the filter and only notices requiring human interven-
tion are reported to the network operator. Local notification filter is discussed in
more detail in Section 6.5.
Figure 6.2 shows the four components in our design and the interaction between
them. Note how the origin events translate to notifications and finally to alarms.
6.3 Origin Change Detection
PHAS detects changes in BGP prefix origins and sends notification messages to regis-
tered prefix owners. For traffic engineering purposes, some networks may change their
prefix origins frequently, which may trigger a large volume of notification messages
if we want to keep track of every change. The main challenge in the system design
is how to notify the owner in a timely manner while not being overwhelmed by the
volume of messages.
76
6.3.1 Instantaneous Origin Changes
We first consider a simple scheme (Algorithm 6) that maintains an origin set for each
prefix and sends a notification whenever the origin set changes. It takes input from a
BGP monitoring project such as Route Views or RIPE. Let {M1,M2, ...,Mi, ...,MN}
denote the set of N BGP routers providing data. By observing the BGP updates sent
by router Mi, we can determine Mi’s current route to prefix P . If Mi has a route to
P at time t, origin(Mi, P, t) denotes the origin AS of P in this route. If Mi has no
route to P at time t, origin(Mi, P, t) is empty. The origin set for prefix P at time t
is defined as OSET (P, t) = ∪Ni=1origin(Mi, P, t). In other words, the instantaneous
origin set is simply the union of the origins currently used by any of the monitors to
reach this prefix. As updates from Mi arrive, origin(Mi, P, t) may change and thus
the origin set may change as well. Whenever the origin set changes, we say an origin
event is triggered.
Algorithm 6: Instantaneous Origin ChangeInitialize origin(Mi, P, t0) using the initial routing table of Mi at time t0;
OSET (P, t0) = ∪Ni=1origin(Mi, P, t0);
if update for prefix P at time t from router Mi is an announcement then
origin(Mi, P, t) = the last AS in the announced path;
else
origin(Mi, P, t) = {};
OSET (P, t) = ∪Ni=1origin(Mi, P, t);
if OSET (P, t) 6= OSET (P, t− 1) then
origin gain = OSET (P, t)−OSET (P, t− 1);
origin loss = OSET (P, t− 1)−OSET (P, t);
send [OSET (P, t), origin gain, origin loss] to prefix owner;
77
1
10
100
1000
10000
100000
0 10000 20000 30000 40000 50000
Num
ber o
f orig
in e
vent
s
Prefix ID
Figure 6.3: Origin events per prefix - December 2005
To study the algorithm behavior, we used data for the month of December 2005,
from the RouteViews collector at the Oregon Internet Exchange. This BGP data col-
lector peers with 42 operational routers from around the globe and thus the origin set is
the union of the origin ASes seen by these 42 peers. The number of prefixes involved
is close to 170,000. Algorithm 6 generated 511,513 origin events involving 48,768
prefixes during December 2005. Thus, close to 30% of the prefixes had one or more
origin set changes. Figure 6.3 shows the distribution of the number of origin events
per prefix (prefixes with no origin events are not plotted).
As the figure shows, some prefixes generated a large number of origin events. In
Algorithm 6, even when the same origin leaves the set and comes back again on a re-
peated basis, each appearance and each disappearance triggers an origin event. For ex-
ample, prefix 207.135.82.64/26 generated 5747 origin events during December 2005,
simply due to the fact that its origin set switched frequently between {2828, 65000},
{2828}, and {65000}. Since some prefixes have unstable connectivity to the Internet,
repeated withdrawal and announcement sequence causes the origin to frequently leave
and join the set, resulting in repeated origin events. In order to detect prefix hijack-
78
1
10
100
1000
10000
100000
1e+06
1e+07
0 100000 200000 300000 400000 500000
Inte
r-arri
val t
ime
(sec
onds
)
Number of cases
Figure 6.4: Inter-arrival time between origin events for a prefix for December 2005
ing events, it is essential to immediately notify the owner when a new origin appears.
However, reporting oscillations between already reported origins, as in this particular
example, can be reduced.
6.3.2 Windowed Origin Changes
We now introduce the notion of windowed origin set. We can mask off repeated and
frequent origin changes by reporting observed origin set over some time window, in-
stead of reporting instantaneous origin set changes. Figure 6.4 plots the inter-arrival
time between origin events. From the figure we can see that the inter-arrival time is
less than 1000 seconds in close to 75% of the cases.
Let OSET (P, t, k) denote the set of all the origins for prefix P observed over the
last k time units. In other words, this windowed origin set consists of all the origins
for P that were observed by at least one router Mi during the time [t − k, t]. More
formally, define origin(Mi, P, t, k) = ∪ti=t−korigin(Mi, P, t) and OSET (P, k, t) =
∪Ni=1origin(Mi, P, t, k). The definition includes the last k units at each time and thus
provides a continuously moving window over which the origins of P are recorded. The
79
algorithm to detect origin changes with a moving window is the same as Algorithm 6,
except that we now have to include the time window k and only send origin events
when OSET (P, t− 1, k) 6= OSET (P, t, k).
It is important to note that this revised algorithm only reduces the number of re-
peated origin events. The prefix owner will be immediately notified whenever a new
(potentially false) origin appears for the first time during the last k time units. Suppose
router Mi is the first to observe a new origin O for prefix P . If this new announcement
first appears at time t, Origin(Mi, P, t, k) = O and thus O ∈ OSET (P, t, k). Since Mi
is the first to observe this origin, it must also be the case that O /∈ OSET (P, t − 1, k).
Thus OSET (P, t − 1, k) 6= OSET (P, t, k) and an origin event is triggered at time t,
i.e., as soon as the new origin appears. This feature guarantees timely detection and
notification of potential prefix hijacking.
However, the addition of a time window does delay the notification of origin-loss
events. Suppose origin O was in fact a prefix hijacking attempt. As discussed above,
the prefix owner is immediately notified when O first appears. Assume as a result of
this fast notification, the owner took actions and quickly resolved the attack. Let Mj
denote the last monitored router to remove O from its routing table at time tend. Al-
though O has been removed from the routing tables, it will not be removed from Mj’s
origin set until time tend + k. Thus O is also not removed from Origin(Mi, P, t, k)
until time tend + k. The net result is that the prefix owner is not notified that O has
been removed until k time units after O has vanished from the routing system.
6.3.3 Adaptive Window Size
Our objective is to reduce the number of repeated origin events for prefixes with fre-
quent origin changes, but not penalize well-behaved prefixes by delaying reports that
an origin has been removed. We start with a base time window of one hour. This
80
masks transient changes for most prefixes, at a cost of delaying notification of origin
loss events by one hour. However, some prefixes still generate a large number of no-
tification messages even with the one hour window. Increasing the window size can
further limit the number of repeated origin events for these prefixes but at the cost of
further delaying origin-loss events for other prefixes. Rather than attempt to assign a
uniform time window for all prefixes all the time, we introduce an adaptive window
resizing scheme for each prefix. Essentially, prefixes that generate a large number of
messages will be penalized by large window size, while other prefixes still use small
window size.
Initially, each prefix starts with a penalty value of penalty(P ) = 0 and a time
window of one hour. Anytime a notification is generated for this prefix, penalty(P )
is increased by 0.5. The penalty value decays exponentially over time and the rate of
decay is determined by a half-life parameter. We currently use a half-life of 2 hours2.
The size of the prefix’s time window is set to 2bpenalty(P )c hours. In other words, a prefix
with penalty(P ) < 1 uses a time window of 20 = 1 hour; a prefix with a penalty(P )
in the range [1, 2) uses a time window of 21 = 2 hours; a prefix with penalty(P ) in
range [2, 3) uses a time window of 22 = 4 hours; and so forth.
Figure 6.5 shows the distribution of origin events generated using this adaptive
window. For comparison, we also show the distribution using a fixed window size of 1
hour and show a zoomed in portion of the plot for the top 10 most active prefixes. Fig-
ure 6.6 shows the number of origin events generated per day using adaptive windows
with a default as 1 hour along with the number of origin events using instantaneous
origin changes for comparison. The introduction of the adaptive window reduces the
number of origin events due to unstable prefixes, while still ensuring that any newly
announced origin is immediately reported to the prefix owner. Prefixes that experience
2In other words, the penalty at time t is exactly one half of the penalty at time (t−2) hours, assumingno additional origin events were generated during that time
81
1
10
100
1000
0 5000 10000 15000 20000 25000
Num
ber o
f orig
in e
vent
s
Prefix ID
1 hour windowAdaptive window
150
200
250
300
350
400
0 2 4 6 8 10
Figure 6.5: Distribution of origin events per prefix using adaptive window
large number of origin changes would experience a longer delay before being notified
of origin loss events, but would still receive immediate notification when a new origin
appears.
6.4 Notification Delivery
Once a notification message is generated, it is delivered to the prefix owner’s regis-
tered mailboxes through email. We choose email for delivery, since it is a ubiquitous
delivery method on the Internet and uses TCP, which provides reliable data transfer.
The email body is signed by the monitor to ensure its integrity. There are two types of
messages: event-driven notifications and periodic refreshes. The event-driven notifica-
tions are triggered by origin set changes, and the email contains corresponding origin
gains or losses. For example prefix 60.253.48.0/24, the notification messages look like
the following:
<TYPE=gain, seqnum=1, GMT-TIME=20041221 12:52:33, PREFIX=60.253.48.0/24,
NEW-SET={23918 31050 29257}, ORIGIN-GAINED=29257>
82
0
5000
10000
15000
20000
25000
30000
12/03 12/10 12/17 12/24 12/31
Num
ber o
f orig
in e
vent
s
Time
Instantaneous changesAdaptive Window size
Figure 6.6: Comparison of origin events per day using instantaneous and adaptive
window
<TYPE=loss, seqnum=2, GMT-TIME=20041221 13:52:49, PREFIX=60.253.48.0/24,
NEW-SET={29257 31050}, ORIGIN-LOST=23918>
The periodic notification is sent at fixed time interval (1 day by default), and the
email contains the complete origin set at that moment. The periodic refresh message
is a soft-state mechanism to provide additional system resilience against unforeseen
errors. For instance, even if a notification is lost due to email server crash, the next
refresh message will bring the owner’s knowledge about the origin set up to date.
The major challenge in our system design is how to deliver notifications success-
fully even in the face of prefix hijacks. When a prefix is being hijacked, some data
traffic on the Internet would go to the false origin instead of the true one. If the path
from our server to the prefix owner is diverted to the false origin, then the owner would
not receive the notification at the time when it is needed the most.
Due to the large scale of Internet routing, a prefix hijack is unlikely to affect all
the paths towards the true origin. Thus in delivering the notification messages, our
system uses multiple diversified paths to improve the chances of successful delivery.
83
Ideally, we can send notifications from the monitors that still have path to the old
origin. But this type of email forwarding service is not part of current BGP monitoring
arrangement with commercial ISPs. Requiring email forwarding from monitors would
undermine the deployability of our service. Thus we leave this as an option for future
development, and instead ask prefix owners to take the responsibility of setting up
multiple diversified delivery paths.
There are two practices recommended for prefix owners. First, when registering
with our system, they should provide multiple email accounts on different servers that
are also topologically diverse, for instance popular email services like GMail and Ya-
hoo! mail. Secondly, they should have Internet access through multiple prefixes. ISPs
often have several prefixes themselves, so this should not be a problem. For ones that
only own a single prefix, a backup plan, such as a dial-up Internet access account, is
recommended.
Figure 6.7 shows how the multiple diversified path delivery works. The owner of
prefix P registers four email addresses, one within P, and three others, X, Y, and Z, in
three different networks. Every notification message will be sent to all four mailboxes.
The prefix owner’s local filter will retrieve these four messages, and then process them.
The email body will contain a sequence number, based on which the local filter decides
whether it is a duplicate or is obsolete. Only emails with new contents pass through
the filter and result in an alarm used for hijack detection. When a prefix P is hijacked,
as long as the owner can access one of X, Y, Z, and our server, the notification will
be delivered. Even if all four mailboxes are not accessible directly from the owner
site, as long as the owner can access the Internet through another prefix, he/she can
still retrieve the notification messages regarding the prefix P. The local filter also pe-
riodically polls the mailboxes. In the event that none of them is reachable, it is very
likely that prefix owner’s Internet access has problems, and the filter will generate an
84
Local Notification
filter
Mail server X
Mail server Y
Prefix Owner
Local mailbox Mail
server Z
Local mail server
PHASServer
Figure 6.7: Notification setup
alarm to the operator. In summary, the combination of multiple topologically diver-
sified mailboxes and multiple prefixes used for Internet access, ensures high delivery
rate for notifications.
6.5 Local Notification Filter
PHAS does not associate a prefix with a true origin or false origin, and thus reports
all origin set changes to the prefix owner. However, not all origin set changes may
be of interest to the prefix owner, especially in the event that the origin set changes
frequently. The local notification filter, is an important tuning block at the user side that
enables the prefix owner to filter out unwanted alarms and alert the user for potential
hijacks. In this section, we explain some basic building blocks for constructing filter
rules and use examples to show how simple rules can control the notifications delivered
to the user.
85
6.5.1 Constructing filtering rules
We define a rule to have the form “IF <condition> THEN <action>”. There are two
basic actions possible; ACCEPT results in the message being delivered and REJECT
results in the message being dropped. The default action is ACCEPT, in case no rules
are specified or no rules are fired. The local filter can contain various rules ordered by
preference, and IF clauses can also be nested. While, multiple rules can be listed, for
each notification message, an action of ACCEPT or REJECT can be performed only
once. In other words, once an action is performed, no more rules are matched for that
notification message. Hence, we encourage users to use rules that are simple and easy
to understand and analyze.
To construct rules, we define the following constructs.
1. CONTAINS: defines what a particular key may contain.
2. DIFF: difference between sets.
3. LT, EQ, GT: correspond to the mathematical <,= and >.
4. NOT: negates the construct it follows. E.g. one may use it with CONTAINS.
5. AND, OR: for combinations of conditions.
6. ANY and ALL: used to deal with sets in rules.
Examples
1. A rule specific to a prefix, and checking to see if the new origin is a known
origin:
IF <ORIGIN-GAINED EQ 29257 AND PREFIX EQ 60.253.48.0/24> THEN
REJECT
86
2. A rule asking to drop all origin loss notifications:
IF <TYPE EQ ”loss”> THEN REJECT
Example of a bad Rule
1. A rule that checks for the existence of an AS in the ORIGIN-SET:
IF <ORIGIN-SET CONTAINS 23918> THEN REJECT
In the event of a hijack that changes the origin set from {23918} to {23918, X},
where X is the hijacker, the notification will not be delivered to the user, since the
origin set still contains AS 23918.
6.5.2 Case Study
We now use a case study to show how simple rules can be used to deal with a real
scenario. We choose prefix 60.253.48.0/24 as an example and look at the notifications
from December 21, 2004 to December 28, 2004, when a known prefix hijack event
happened. A sample of the notifications seen by the filter is shown below.
<TYPE=gain, GMT-TIME=20041221 04:44:45, PREFIX=60.253.48.0/24, NEW-
SET={23918, 31050}, ORIGIN-GAINED=31050>
<TYPE=gain, GMT-TIME=20041221 12:52:33, PREFIX=60.253.48.0/24, NEW-
SET={23918, 31050, 29257}, ORIGIN-GAINED=29257>
<TYPE=loss, GMT-TIME=20041221 13:52:49, PREFIX=60.253.48.0/24, NEW-
SET={29257, 31050}, ORIGIN-LOST=23918>
<TYPE=loss, GMT-TIME=20041221 13:53:56, PREFIX=60.253.48.0/24, NEW-
SET= {29257}, ORIGIN-LOST=31050>
For this prefix, we observed three origin ASes: AS 29257, AS 31050 and AS
87
23918. The origin set fluctuated between various combinations of these three ASes
causing notifications to be sent to the owner. Without local filtering, all these legitimate
changes would have resulted in alarms being sent to the prefix owner. However, the
prefix owner, knowing all these three legitimate origin ASes, can set simple rules to
filter out these changes:
IF <ORIGIN-GAINED EQ ANY {23918,31050,29257} > THEN REJECT
IF <ORIGIN-LOST EQ ANY {23918,31050,29257} > THEN REJECT
Note, each notification contains only one value for ORIGIN-GAINED or ORIGIN-
LOST, and hence we can use EQ (equals) clause here. With this rule in place, the prefix
owner would only receive an alarm when the origin changes passes both rules. Around
9:30 AM on Dec 24, 2004, such an alarm happened:
<TYPE=gain, GMT-TIME=20041224 09:30:29, PREFIX=60.253.48.0/24, NEW-
SET={23918 9121}, ORIGIN-GAINED=9121>
<TYPE=loss, GMT-TIME=20041224 11:35:02, PREFIX=60.253.48.0/24, NEW-
SET= {23918}, ORIGIN-LOST=9121>
The first alarm indicates that AS 9121 is now hijacking the prefix 60.253.48.0/24.
The owner knows that this is not a legitimate origin for this prefix, and can then take
appropriate actions. An alarm is also generated to inform the owner that AS 9121
stopped announcing the prefix, indicating the matter has been resolved.
6.6 Evaluation
To evaluate the overhead of the system, we use BGP log data to calculate the number of
origin events generated by the PHAS server, and the number of notifications received
by each AS. We also apply our method to the data collected during known hijack
events to show that PHAS can indeed catch those events. Finally, we use simulations
88
1000
1500
2000
2500
3000
3500
4000
4500
5000
06/11 06/25 07/09 07/23 08/06 08/20
Num
ber o
f orig
in e
vent
s
Time (1 day bins)
Adaptive window
Figure 6.8: Origin events per day from June 1, 2005 to August 31, 2005
to evaluate the success ratio of notification delivery using multi-path delivery scheme.
6.6.1 Notification Messages
Figures 6.8 and 6.9 plot the number of origin events per day over a 6 month period
from June 2005 to November 2005. The origin events generated per day for month
of December 2005 were shown in Figure 6.6 in Section 6.3. Throughout this period,
we observed the number of events captured per day to be around 2000, with a few
occasional spikes. From a system point of view, sending 2000 messages per day is
manageable, even with multiple email delivery.
We now look from users’ point of view to see how many notification messages they
would receive if subscribed to receive PHAS alerts. We treat each origin event as one
notification, assuming all prefixes are registered to receive alerts. For our evaluation,
we use the events generated for the month of December 2005. We first evaluate the
notifications received per prefix. From Figure 6.5 in Section 6.3, we see that only 20K
out of more than 150K prefixes were involved in origin events. Of those 20K prefixes,
almost all of them had less than 10 origin events per month. Only a handful of prefixes
89
0
1000
2000
3000
4000
5000
6000
09/03 09/17 10/01 10/15 10/29 11/12 11/26
Num
ber o
f orig
in e
vent
s
Time (1 day bins)
Adaptive window
Figure 6.9: Origin events per day from September 1, 2005 to November 30, 2005
had more than 100 origin events per month. The worst case being 209.140.24.0/24 with
196 origin events. A closer look at the alarms revealed that the origin set alternated
between {} and {3043}, which indicates the prefix was unstable. From, these numbers
for origin events, one can see that the number of notifications expected per prefix is
quite small, except for some unstable prefixes. For cases of unstable prefixes, the
owner’s local filter will be able to handle such redundant notifications easily.
Since a prefix owner may register multiple prefixes, we also look at number of
notifications expected per AS for the month of December 2005. For evaluation pur-
pose, we estimated the prefixes registered by each AS by using the routing table to
map every prefix to its origin AS. Figure 6.10 shows the number of origin events per
AS for December 2005. Only about 3.5K ASes out of the total 18K ASes received no-
tifications. Of those ASes that received notifications, 97% of them received less than
100 notifications in the entire month. The worst case was AS 29257, receiving 2501
notifications, with the OSET (P ) fluctuating between combinations of 4 origins. These
numbers for origin events per AS indicate that in most cases, an AS would receive a
small number of notifications, and in extreme cases, local filters can once again deal
90
1
10
100
1000
10000
0 500 1000 1500 2000 2500 3000 3500 4000
Num
ber o
f orig
in e
vent
s
AS ID
Figure 6.10: Distribution of events per AS for December 2005
with the common pattern of notifications. All of the above results show that the load of
notification generation, transmission, and processing are easily manageable by a single
machine, even when all the prefixes are registered with PHAS.
6.6.2 Detecting Known Events
We now check if our system would have caught some known prefix hijack events. One
such prefix hijack occurred on May 7, 2005 when AS 174 hijacked one of Google’s
prefixes, 64.233.161.0/24, causing Google to be unreachable during this time. When
run over this period of time, PHAS caught this origin set change and indicated AS 174
as the origin gained during this event.
A larger scale hijack event occurred on Dec 24, 2005. AS 9121 announced itself
as origin to over 106K prefixes. PHAS detected 106082 unique prefixes with origin
9121 added to its origin set and a total of 217884 origin events. Most prefixes had
2 notifications, one reporting the addition of AS 9121, and the other reporting the
removal of AS 9121.
Another case of hijack occurred on Jan 22, 2006, when AS 27506 announced itself
91
as origin to some other’s prefixes. For this day we detected 41 unique prefixes with
AS 27506 as a new origin, and a total of 141 origin change events. For some prefixes,
the AS 27506 was announced as origin, then withdrawn, and then re-announced and
withdrawn again resulting in multiple origin events.
Overall, PHAS successfully caught every known prefix hijack due to false origin
in a timely manner, and the timing matched reports from other sources.
6.6.3 Notification Delivery
To have multiple diverse paths for notification delivery, we recommend the prefix own-
ers register multiple mailboxes and have multiple prefixes for Internet access. If they
do have multiple prefixes, they can always receive the notification messages assuming
only one is hijacked. In this subsection, we evaluate the effectiveness of using multiple
mailboxes through simulations on Internet topology.
The approach is to take an Internet AS graph as the topology, tag each link with
inferred relationship, assume the widely adopted “no-valley” routing policy on every
node, then compute the shortest policy-compliant path between any two nodes. For
each calculation, the input includes one true prefix origin, one false origin, and a set of
mailboxes. Based on the computed shortest paths, we can find out the success ratio of
notification delivery.
The AS Topology is collected from multiple sources, including BGP monitors,
route servers, looking glasses, and routing registry [54]. The AS relationship is in-
ferred using the method in [50]. Two set of mailboxes are used for comparison. The
first set is RouteViews (AS 3582) only, which is called “direct delivery” without other
mailbox. The second set is RouteViews plus GMail (AS 15169), Yahoo Mail (AS
10310), and Hotmail (AS 12076). We randomly picked 276 ASes to form the origin
pairs. They are 15 tier-1 ASes, 21 tier-2 ASes, 20 tier-3 ASes, 20 tier-4 ASes, and 200
92
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
CD
F
Delivery Ratio
Direct Delivery onlyPlus 3 other mailboxes
Figure 6.11: Delivery Ratio
tier-5 ASes. We exhaust all the combinations of origin pairs, a total of 75900 cases.
Given an origin pair, some nodes will take the path to the true origin, while the
others will take the path to the false origin. If a mailbox node takes the path to the true
origin, the prefix owner will be able to access this mailbox and receive the notifica-
tion. Otherwise the notification is lost. Delivery ratio is defined as the percentage of
mailboxes that take the path to the true origin.
Note the simulation results will be symmetric. That is, suppose there is 20% deliv-
ery ratio for a given pair of true origin and a false origin, then it will be 80% when the
role of these two origins switches. Since we exhaust all combinations of origin pairs,
whenever there is a case of a% delivery ratio, there will be a corresponding case of
(1− a%) delivery ratio.
In our path computation, we use random tie-breaking when there’re multiple short-
est paths. For example, if a mailbox has two equal paths, one leads to the true origin,
the other leads to the false origin, we count this as 0.5 notifications from this mailbox
can be delivered.
93
0
10000
20000
30000
40000
50000
60000
70000
0 0.5 1 1.5 2 2.5 3 3.5 4
Cum
ulat
ive
num
ber
of c
ases
Number of successfully delivered notifications
Figure 6.12: Delivery Number
Figure 6.11 compares the delivery ratio of with and without additional mailboxes.
Without the three additional mailboxes, about 30% of notifications will be guaranteed
delivered, about 30% of notifications will be lost for sure, and the rest may be delivered
by certain probability. With the three additional mailboxes, the non-delivery ratio
drops to about 10%.
Figure 6.12 shows the number (not the ratio) of notifications that can get delivered.
In about two thirds (x ≥ 1) of the cases, we have at least one messages are guaranteed
to be delivered. It doubles compared with using only the direct delivery (30%). This
suggests that three additional mailboxes can greatly improve the notification delivery,
but we may still need more mailboxes for higher success ratio.
6.7 Extensions to basic system
So far we have focused on detecting false origins. In this section, we discuss other
ways of hijacking a prefix besides directly announcing a prefix and discuss extensions
to the current system to deal with some of these cases.
94
6.7.1 Classification of Prefix hijack
At the highest level, the attacker AS could target a prefix that is already being an-
nounced by another AS, which we term as valid prefix. The attacker may pretend to
be the owner of this prefix and originate the prefix resulting in a false origin hijack,
that is the focus of this chapter. Another way to hijack a prefix is by announcing a
valid origin, but report invalid path to the origin. For false paths, we separate the case
of false last hop, from false information on any other hop in the path, since the prefix
owner’s AS knows its immediate neighbors, and hence can identify whether the last
hop is valid or not.
An attacker AS may also announce a prefix that is not being announced by another
AS, termed as invalid prefix. If the attacker announces a sub-prefix of some valid pre-
fix, termed as a covered prefix hijack, then routers in the Internet may contain routes
to both the victim AS’s prefix as well as the attacker’s prefix. However, if the des-
tination IP of a packet being routed, falls under the attacker’s prefix space, then due
to longest prefix match, the data would be forwarded to the attacker. An attacker AS
may also announce a less specific prefix than a valid prefix, termed as a covering prefix
hijack but will receive traffic, only when the route to the valid prefix is withdrawn.
For example, if AS 110 announces 131.0.0.0/8, then AS A would route traffic destined
to the valid prefix 131.179.0.0/16, to AS 110 only when the prefix 131.179.0.0/16 is
withdrawn. Finally, an AS may announce an invalid prefix that does not conflict with
any used prefix space. For example, spammers are known to use unused prefixes for
spam purpose. Figure 6.13 shows the classification explained above.
Prefix hijacks could also include combinations of various types in Figure 6.13. E.g.
AS 110 announcing 131.179.0.0/24 (invalid covered prefix) with the path {110, 52}
(invalid last hop). In Figure 6.13, the hijacks in bold (false origin, covered prefix, false
last hop) are the ones where the prefix owner knows of what is legitimate and what
95
Hijack
Valid Prefix Invalid Prefix
false origin false path covered Prefix
covering Prefix
false last hop
false n-th hop
unusedprefix
Figure 6.13: Types of prefix hijacks
may not be, and protection against these attacks is the focus of PHAS. We now discuss
two other sets to deal with covered prefix hijack and false last hop hijack.
6.7.2 Sub-prefix Set
The idea of using a sub-prefix set is to provide the owner of an IP prefix with the
information about whether anybody is announcing a more specific prefix under its
assigned space. This would catch hijacking event where a prefix, say 131.179.96.0/26
is announced by a hijacker AS 100, but the prefix is part of the address space covered
by 131.179.0.0/24, which is owned by AS 52.
For an IP prefix x, some or all of its assigned address space might get further di-
vided into a number of longer prefixes. Each of these prefixes is a known as a covered
prefix of x . The set of all covered prefixes of x observed from the BGP monitors,
is denoted as CP (x). For example, if UCLA announces 131.179.0.0/16 as well as
131.179.96.0/24 and 131.179.59.0/24, thenCP (131.179.0.0/16) is {131.179.96.0/24, 131.179.59.0/24}.
We define a sub-prefix set SPSET (x) to consist of all y ∈ CP(x) such that there
does not exist another prefix z ∈ CP(x) with y ∈ CP (z). In other words, the set
SPSET (x) contains only the first level covered prefix for prefix x.
96
As an example of how this SPSET could be useful, we present a case from Jan 22,
2006. The prefix 208.0.0.0/11, owned by Sprint, generated one origin event at 5:06 am
UTC indicating that the sub-prefix set had changed from {} to {208.28.1.0/24} with
origin {27506}. The prefix in question, 208.28.1.0/24 is not usually seen in the global
routing tables, but in this case AS 27506 announced this prefix, which covers a portion
of Sprint’s 208.0.0.0/11 prefix space, thus resulting in a hijack.
6.7.3 Last Hop Set
The last hop set is maintained with the objective of detecting false last hops in BGP
announcements. Once again, the owner of the prefix would know the legitimate next
hops based on peering agreements and reports of such changes would allow the owner
to detect false last hops in BGP paths.
We define an last hop set LHSET (A) as the set of last hops for all prefixes with AS
A as the origin. For example, if M1 observed a path (7018, 1239, 52) to prefix P1, M2
has a path (3356, 1239, 52) to P2, and M3’s path to P3 is (701, 852,52), then the last
hop set of AS 52, or LHSET (52), is {1239, 852}. Note, that last hop is defined for an
AS, and not for a prefix, since it is reflecting topological connectivity.
The main objective of using the sub-prefix set and the last hop set is to identify
potential hijacks involving more specific address space and last hop changes. However,
the sub-prefix set for large address blocks like 12.0.0.0/8 can be potentially huge, and
may cause lots of dynamics. Similarly, the size of last hop sets for nodes with rich
connectivity (e.g. tier 1 ISP) can also be significant, and may fluctuate a lot. For future
work, we plan to understand the dynamics of these two sets, define how to use these
sets, and include them as a part of the PHAS system.
97
CHAPTER 7
Understanding Resiliency of the Internet against Prefix
Hijack
Prefix hijacking is a serious security threat in the Internet. Prefix hijacks can potentially
be launched from any part of the Internet and can target any prefix belonging to any
network. A hijack attack has a large impact if the majority of routers choose the
path leading to the false origin. Conversely, if the majority of routers choose the path
leading to the true origin, the network of the prefix owner is considered to be resilient
against prefix hijack attacks. Although there have been several results on preventing
prefix hijacks (e.g., [20][31]) and monitoring potential prefix hijack attempts (e.g.,
[24, 37]), there is a lack of a general understanding on the impact of a successful
prefix hijack and networks’ resiliency against such attacks. This lack of understanding
makes it difficult to assess the overall damage once an attack occurs, and to provide
guidance to network operators on how to improve their networks’ resilience.
In this chapter, we conduct a systematic study to gauge the impact of prefix hijacks
launched at different locations in the Internet topology, and identify topological char-
acteristics of those networks that are most resilient against hijacks of their prefixes.
Specifically, we deal with a type of prefix hijack referred to as false origin hijacks
where a network announces the exact prefix announced by another network. Using
simulations on an Internet scale topology and measurements from real data, we esti-
mate how many nodes in the Internet may believe the true origin and how many believe
98
the false origin during a hijack. Our results show that the Internet topology hierarchy
and routing policies play an essential role in determining the impact of a prefix hijack.
Our study shows that the high degree networks (e.g., tier-1 ISPs) are not necessarily
most resilient against prefix hijacks. Instead, small networks that are direct customers
to multiple tier-1 ISPs are seen to be most resilient. Conversely, attacks launched from
these multi-homed customer networks would also have the biggest impact. Implica-
tions of our results are twofold. First, networks that desire high resilience against prefix
hijacks should connect to multiple providers, and be as close as possible to multiple
tier-1 ISPs and networks that cannot achieve such topological connectivity, should use
reactive means to learn about their prefix being hijacked. Second, securing only the
big ISP networks is not adequate nor effective, since high impact attacks come from
well connected small networks.
7.1 Hijack Evaluation Metrics
For our simulations, we model the Internet opology as a graph, in which each node
represents an AS, and each link represents a logical relationship between two neigh-
boring AS nodes. Note, two neighboring odes may have multiple physical links be-
tween themselves. However, BGP paths are represented in the form of AS AS links,
and hence we abstract connections between two AS nodes as a single logical link. For
simplicity, each node owns exactly one unique prefix, i.e. no two nodes announce the
same prefix except during hijack. A prefix hijack at any given time involves only one
hijacker, and the hijacker can target only one node.
99
4
2
1
3
6
5
Provider CustomerPeer Peer
1
1
Tier-1
True origin False origin
1
Figure 7.1: Hijack scenario.
Terminology
In the rest of this chapter, we use the term prefix hijacks to refer to false origin prefix
hijacks. We call the AS announcing a prefix it does not own as the false origin, and
the AS whose prefix is being attacked as the true origin. Upon receiving the routes
from both the false origin as well as the true origin, an AS that believes the false origin
is said to be deceived, while an AS that still routes to the true origin is said to be
unaffected.
To capture the interaction between the entities involved in a hijack, we introduce a
variable β(a, t, v), function of false origin a, true origin t and node v as follows:
β(a, t, v) =
1 : if node v is deceived by false
origin a for true origin t’s prefix
0 : otherwise
(7.1)
Due to the rich connectivity in Internet topology, a node often has multiple equally
good paths to reach the same prefix. Figure 7.1 shows a case where AS-4 has three
equally good paths to reach the same prefix, two to the true origin AS-1 (through AS-2
100
and AS-3), and one to the false origin AS-6. In our model, we assume a node will
break the tie randomly. Therefore, we define the expected value of β as follows. Let
p(v, n) be the number of equally preferred paths (e.g. same policy, same path length)
from the node v to node n. E.g., in Figure 7.1, p(4, 1) = 2 since AS-4 has two paths
via AS-2 and AS-3 to reach AS-1, and p(4, 6) = 1 since AS-4 has only one route via
AS-5 to reach AS-6. If nodes use random tie-break to decide between multiple equally
good preferred paths, then the expected value for β is defined as:
β̄(a, t, v) =p(v, a)
p(v, a) + p(v, t)(7.2)
yielding β̄(6, 1, 4) = 13
for the example in the figure. β̄ is the probability of a node v
being deceived by a given false origin a announcing a route belonging to true origin t.
Impact
We use the term impact to measure the attacking power of a node launching prefix
hijacks. We define impact of a node a as the fraction of the nodes that believe the false
origin a during an attack on true origin t. More formally, the impact of a node a is
given by:
I(a) =∑t∈N
∑v∈N
β̄(a, t, v)
(N − 1)(N − 2)(7.3)
Note that the outer sum is over N −1 true origins (we exclude the false origin) and the
inner sum is over N − 2 nodes (excluding both the false origin and true origin).
Resilience
We use the term resilience to measure the defensive power of a node against hijacks
launched against its prefix. We define the resilience of a node t as the fraction of nodes
that believe the true origin t given an arbritary hijack against t. More formally, the
101
node resilience R(t) of a node t is given by:
R(t) =∑a∈N
∑v∈N
β̄(t, a, v)
(N − 1)(N − 2)(7.4)
Note, higher R(t) values indicate better resilience against hijacks, and higher I(a)
values indicate higher impact as an attacker.
Relation between Impact and Resilience
The true origin t and false origin a compete with each other to make nodes in the
Internet route to itself. For example in Figure 7.1, false origin AS-6 is hijacking a
prefix belonging to true origin AS-1. In this case, only AS-5 believes the false origin
and AS-4 has a 1/3 chance of being deceived. Therefore, the chances that a node
believes the false origin AS-6 when it hijacks AS-1 is given by 1+1/34
= 13.
Now if AS-1 was to hijack a prefix belonging to AS-6, then AS-5 would still believe
AS-6 and AS-4 will believe it with a probability of 1/3. Thus, in this case, the chances
that a node believes the true origin AS-6 when it is hijacked by AS-1 is 1+1/34
= 13.
We see that the resilience of the node as a true origin is equal to its impact as
a false origin. We note that in our model, when the roles of attacker and target are
switched, the impact of a node becomes its resilience. In the rest of the chapter, we
focus on resilience, while keeping in mind that a highly resilient node can also cause
high impact as a false origin.
7.2 Evaluating Hijacks
In this section, we aim to understand the topological resilience of nodes against prefix
hijacks by performing simulations on an Internet derived topology. We first explain
the simulation setup, followed by the main results of our simulation and the insight
102
behind the results.
7.2.1 Simulation Setup
For our simulations, we use an AS topology collected from BGP routing tables and
updates, representing a snapshot of the Internet as of Feb 15 2006 (available from
[53]). The details of how this topology was constructed are described in [54]. Our
topology consists of 22,467 AS nodes and 63,883 links. We assume each AS node
owns and announces a single prefix to its neighbors. We classify AS nodes into three
tiers: Tier-1 nodes, transit nodes, and stub nodes. To choose the set of Tier-1 nodes, we
started with a well known list, and added a few high degree nodes that form a clique
with the existing set. Nodes other than Tier-1s but provide transit service to other AS
nodes, are classified as transit nodes, and the remainder of nodes are classified as stub
nodes. This classification results in 8 Tier-1 nodes, 5,793 transit nodes, and 16,666 stub
nodes. We classify each link as either customer-provider or peer-peer using the PTE
algorithm[11] and use the no valley prefer customer routing policy to infer routing
paths (also used in previous works such as [52]). We abstracted the router decision
process into the following priorities (1)local policy based on relationship, (2)AS path
length, and (3)random tie-breaker.
Of the 22,467 AS nodes in our topology, we randomly picked 1,000 AS nodes
to represent false origins that would launch attacks on other AS nodes. We checked
the degree distribution of this set of 1,000 AS nodes, and found it to be similar to
the degree distribution of all the AS nodes. For each of the 22,467 AS nodes as a
true origin, we simulated a hijack with the 1,000 false origins. Thus we simulated
22, 467× 1, 000 ' 22.5 million hijack scenarios in total.
103
0
0.2
0.4
0.6
0.8
1
0 5000 10000 15000 20000
Res
ilien
cy
Node ID
Avg.Avg. + Dev.Avg. - Dev.
Figure 7.2: Distribution of node resilience.
7.2.2 Characterizing Topological Resilience
Figure 7.2 shows the distribution of the resilience (average curve) for all the nodes
in our topology from our simulated hijacks. Since the resilience of each node results
from the average over 1,000 attackers, we also show the standard deviation range.
Note, higher values of resilience imply more resilience against hijacks.
This distribution shows that node resilience varies fairly linearly except at the two
extremes. Figure 7.2 also shows that the deviations at the two extremes are quite small
compared to the middle, indicating that some nodes(top left) are very resilient against
hijacks, while some others (bottom right) are easily attacked, regardless of the location
of the false origin.
As a first step in understanding how different nodes differ in their resilience, we
classify nodes into the three classes already described: tier-1, transit and stub and
plot the average resilience distribution (CDF) of each class of nodes in Figure 7.3. We
observe that the resilience distribution is very similar for transits and stubs, with transit
nodes being a little more resilient than stubs.
104
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Freq
uenc
y (C
DF)
Node resiliency
Tier-1sTransits
Stubs
Figure 7.3: Resilience of nodes in different tiers.
In contrast, tier-1 nodes show a very different distribution from the stubs and tran-
sits. From Figure 7.3 we observe that all the tier-1 nodes have an average resilience
value between 0.4 and 0.5. In addition, we note that about 40% of stubs and 55%
of transit nodes are more resilient than all tier-1 nodes. With tier-1 nodes being the
ones with the highest degree, it is surprising to see that close to 50% of the nodes
in the Internet are more resilient than tier-1s. Next, we explain why tier-1 nodes are
more vulnerable to hijacks than a lot of other nodes and generalize this explanation to
understand the characteristics impacting resilience.
7.2.3 Factors Affecting Resilience
We first understand the resilience of tier-1 nodes with a simple hijack scenario in Fig-
ure 7.4. AS-2, AS-3, AS-4 and AS-5 represent 4 tier-1 nodes inter-connected through
a peer-peer relationship. AS-1 and AS-6 are small ISPs connected to tier-1 AS nodes
through a customer-provider relationship. Finally AS-7 is a multi-homed customer of
AS-1 and AS-6. In Figure 7.4, AS-7 represents the false origin that hijacks a prefix
belonging to a tier-1 node, AS-4.
105
Recall in no-valley prefer customer policy, a customer route is preferred over a
peer route which in turn is preferred over a provider route. When AS-7 hijack’s AS-
4’s prefix and announces the false route to AS-1 and AS-6, both AS-1 and AS-6 prefer
the hijacked route over the genuine route to AS-4 since its a customer route. AS-1 in
turn announces the hijacked route to its tier-1 providers AS-2 and AS-3. These tier-1
AS nodes, AS-2 and AS-3 now have to choose between a customer route through AS-
1(hijacked route), and a peer route through AS-4 (genuine route). Again due to policy
preference, the tier-1 nodes will choose the customer route which happens to be the
hijacked route. Similarly, AS-5 will also choose the hijacked route. Once big ISPs like
tier-1 nodes are deceived by the hijacker, their huge customer base (many of whom are
single homed) are also deceived, thus causing a high impact. One can see from this
example, that the main reason for the low resilience in the case of a hijack on a tier-1
node is that tier-1 nodes inter-connect through peer-peer relationship thus rendering a
genuine route less preferred to other tier-1 nodes than hijacked routes from customers.
The key to high resilience is to make the tier-1 nodes and other big ISPs always
believe the true origin. The way to achieve this is to reach as many tier-1 nodes as
possible using a provider route. In addition, when a node has to choose between two
routes of the same preference, path length becomes a deciding factor, and thus the
shorter the number of hops to reach the tier-1 nodes, the better the resilience. From
our observations from simulation results, we found that the most resilient nodes are
direct customers of many tier-1 nodes and other big ISPs. As an example, in our
simulations, the node with highest resilience is a stub (AS-6432 DoubleClick) directly
connected to 6 tier-1 nodes, having a resilience value of 0.95. The nodes with lowest
resilience were single-home customers, connected to poorly connected providers.
To better understand the influence of tier-1 nodes, we classified the nodes in the
Internet based on the number of direct tier-1 providers. Figure 7.5 shows the distri-
106
4
2
1
3
6
5
Provider CustomerPeer Peer
Tier-1
7False origin
True origin
4 4
4 4
4
Figure 7.4: Understanding resilience of tier-1 nodes
bution of resilience for nodes with different connectivity to Tier-1. Note, the closer
the curve to the right hand side of the figure (x=1), the better the resilience of that set
of nodes. There are about 21,888 nodes with less than 3 connections to Tier-1, and
we observe in Figure 7.5 that these nodes are the least resilient. A total of 379 nodes
are directly connected to 3 Tier-1s and 104 nodes are connected to 4 Tier-1s. Only
88 nodes are connected to more than 4 Tier-1s, and these nodes prove to be the most
resilient, highlighting the role of connecting to multiple tier-1 nodes.
Summary: In this section, we used an Internet scale topology with no-valley prefer
customer policy routing to evaluate the resilience of nodes against random hijackers.
The key to achieve high resilience is to protect tier-1 nodes and other big ISPs from
being deceived by the hijacker. Our main result shows that the nodes that are direct
customers of multiple tier-1 nodes are the most resilient to hijacks. On the other hand,
the tier-1 nodes themselves in spite of being so well connected, are much less resilient
to hijack. The next question we seek to answer in Section 7.3 is whether there is
evidence of such behavior in reality, where the routing decision process is much more
complex.
107
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Freq
uenc
y (C
DF)
Node resiliency
<3 Tier-1s=3 Tier-1s=4 Tier-1s>4 Tier-1s
Figure 7.5: Resilience of nodes with different number of Tier-1 providers.
7.3 Prefix Hijack Incidents in the Internet
In this section we examine two hijack events, one from January 2006 which affected
a few tens of prefixes, and the other from December 2004 when over 100,000 prefixes
were hijacked. To gauge the impact of the prefix hijacks, we analyzed the BGP routing
data collected by the Oregon collector of the RouteViews project. The Oregon collec-
tor receives BGP updates from over 40 routers. These 40 routers belong to 35 different
AS nodes (a few AS nodes have more than one BGP monitor) and we consider an AS
as deceived by a hijack if at least one BGP monitor from that AS believes the hijacker.
We call these 35 AS nodes as monitors, as they provide BGP monitoring information
to the Oregon collector. The impact of a hijack is then gauged by the ratio of monitors
in the Internet that were deceived.
7.3.1 Case I: Prefix Hijacks by AS-27506
On January 22, 2006, AS-27506 announced a number of prefixes that did not belong
to it. This hijack incident was believed to be due to operational errors, and most of
108
the hijacked prefixes were former customers of AS-27506. We observed a total of
40 prefixes being hijacked by AS-27506. These 40 prefixes belonged to 22 unique
ASes. We present two representative prefixes; for the first prefix the false origin could
only deceive a small number of monitors, while for the second prefix the false origin
deceived the majority of the monitors. We examine the topological connectivity of the
true origins as compared to that of the false origin and the relation to the true origin’s
resiliency.
7.3.1.1 High Resiliency against Hijack
We examine a hijacked prefix that belongs to the true origin AS-20282. The impact
of hijacking this prefix is just over 10%, that is 4 out of the 35 monitored ASes were
deceived by the hijack. Figure 7.6 depicts the connectivity of some of the entities
involved in this hijack incident. The nodes colored in gray are the nodes deceived by
the false origin AS-27506, and the white nodes persisted with the true origin. The
true origin AS-20282 is a direct customer of two tier-1 nodes, AS-701 and AS-3356.
Before the hijack incident, all the 35 monitors used routes containing one of these two
tier-1 ASes as the last hop in the AS path to reach the prefix. The hijacker AS-27506
is a customer of AS-2914, another tier-1 node. When AS-27506 hijacked the prefix,
AS-2914 chose the false customer route from AS-27506 over an existing peer route
through AS-701. The false route was further announced by AS-2914 to other tier-1
peers including AS-701 and AS-3356, however neither of them adopted the new route
because they chose the customer route announced by the true origin AS-20282. Other
tier-1 ASes, such as AS-1239 (not shown in the figure), did not adopt to the false route
from AS-2914 either, most likely because the newly announced false route was 2 hops
in length, the same as that of their existing route through AS-701 or AS-3356, and
the recommended practice suggests to avoid unnecessary best path transitions between
109
701 2914
3356
20282 3130
Tier-1
PeerPeerProvider Customer
true Origin
27506
false origin
X
Figure 7.6: High resiliency against hijack
equal external paths [9]. However we note that AS-3130, who is a customer of both
a deceived and an unaffected tier-1 providers, also got deceived, possibly because the
new path {2914, 27506} is shorter than the original path which contained 3 AS hops.
7.3.1.2 Low Resiliency against Hijack
Next, we examine another hijacked prefix which belonged to AS-23011. The average
impact of this hijacked prefix is 0.6, i.e. 21 out of the 35 monitors were deceived by
the hijack. Figure 7.7 shows the most relevant entities involved in this prefix hijack.
The true origin of this prefix was an indirect customer of 5 tier-1 ASes (not all of them
are shown in the figure) through its direct providers AS-12006 and AS-10910. The
connectivity of the hijacker is the same as before, and AS-2914 was deceived by the
hijack. The 5 tier-1 ASes on the provider path of the true origin stayed with the route
from the true origin AS-23011, however the rest of the tier-1 ASes were deceived this
time, possibly because the peer route to false origin through AS-2914 was shorter than
any other peer route to the true origin. AS-286 is a customer of the providers of both
the true and false origins, and it picked the false route through AS-2914 because it was
110
2914
12006 3130
Tier-1
PeerPeerProvider Customer
true Origin
27506
false origin
10910
23011
286
Y
X
Z
Figure 7.7: Low resiliency against hijack
shorter. We note that, in this case, the true origin being indirect customers of multiple
tier-1 ASes ensured that those tier-1 ASes themselves did not get deceived, however
due to its longer distance to reach these tier-1 providers (compared to the true origin
in Figure 7.6, other tier-1 ASes and their customers chose the shorter route to the false
origin.
One of the tier-1 providers that propagated the false route is known to verify the
origin of received routes with the Internet Routing Registries (IRR). However, it did not
block the hijack because the registry entries were outdated and still listed AS-27506
as an origin for the hijacked prefixes, and hence the hijack announcements passed the
registry check.
7.3.2 Case II: Prefix Hijacks by AS-9121
In this hijack incident, operational errors led AS-9121 to falsely announce routes to
over 100,000 prefixes on December 24, 2004. We use this case to evaluate the re-
siliency of tier-1 Ases as compared to that of direct customers of multiple tier-1 ASes.
111
Due to the large number of prefixes being falsely announced, some BGP protection
mechanisms such as prefix filters and maximum prefix limit, where an AS sets an up-
per limit on the number of routes a given neighbor may announce, were triggered and
made an effect on the overall impact. Given that multiple factors were involved in such
a large scale hijack event, it is difficult to accurately model the impact on an AS as a
function of its topological connectivity. Our objective in examining this case is to find
supporting evidence for our observations made in Section 7.2, as opposed to a detailed
study over all the hijacked prefixes. Similar to case-1, we observed how many moni-
tors were deceived for each hijacked prefix and used this result to gauge the resiliency
of the true origin AS.
7.3.2.1 Hijacked Tier-1 AS Prefix
In order to understand how tier-1 ASes fared against AS-9121 hijack, we studied the
impact of those hijacked prefixes that belonged to AS-7018, a tier-1 AS. Note that
AS-7018 announced over 1500 prefixes, and the impacts of different prefixes varied
noticeably, with around 7 to 8 monitors being deceived for most prefixes. For our case
study, we examine one of the hijacked prefixes which deceived the majority of the
monitors. Figure 7.8 shows the entities involved in the hijack of this tier-1 prefix.
The hijacker AS-9121 was connected to 3 providers, one of which was AS-1239,
a tier-1 AS. The true origin of the prefix in question was AS-7018, another tier-1 AS.
The grey nodes in the figure indicate those deceived by the hijack. All the 3 providers
of AS-9121, namely AS-1239, AS-6762, and AS-1299 were deceived into believing
the false origin. AS-1299 also propagated the false route to its tier-1 AS providers.
From our observations, a total of 19 out of 35 monitors were deceived by this hijack.
112
1239
6762
Tier-1
PeerPeerProvider Customer
true Origin
9121
false origin
7018
Y
X
1299
Figure 7.8: Tier-1 prefix hijacked
7.3.2.2 Hijacked Prefix belonging to Customer of Tier-1s
Next, we see how the AS-9121 hijack incident affected the prefixes belonging to an
AS that was a direct customer of multiple tier-1 ASes. We picked AS-6461 as an
example here because it connected to all the 8 tier-1 ASes. AS-6461 announced over
100 prefixes, 87 of which were hijacked by AS-9121. No more than 2 monitors were
deceived by the false origin of all the hijacked prefixes. Figure 7.9 shows the entities
involved in the hijack of one of the prefixes belonging to AS-6461. As before, AS-
6762 believed the false origin and was one of the monitors deceived of all the hijacked
prefixes of AS-6461. However, because all the tier-1 ASes were direct providers of
AS-6461, they stayed with the original one-hop customer route to the true origin; in
particular, note that AS-1239 was a provider for both the true origin and the hijacker,
and it stayed with the original correct route. As a result, the hijack of AS-6461’s
prefixes made a very low impact.
In addition to AS-6461, we also studied the impacts of prefixes belonging to a few
113
1239
6762
Tier-1
PeerPeerProvider Customer
true Origin
9121
false origin
X
12996461
Figure 7.9: Multi-homed customer of tier-1s hijacked
other transit ASes that were very well connected to tier-1 ASes, and found the impact
pattern for their prefixes to be very similar to the AS-6461 case. To summarize, this
real life hijack event showed strong evidence that direct multi-homing to all or most
tier-1 ASes can greatly increase an AS’s resiliency against prefix hijacks.
7.4 Discussion
It has been long recognized that prefix hijacking can be a serious security threat to the
Internet. Several hijack prevention solutions have been proposed, such as SBGP [20],
so-BGP [31], and more recently the effort in the IETF Secure Inter-Domain Routing
Working Group [2]. These proposed solutions use cryptographic-based origin authen-
tication mechanisms, which require coordinated efforts among a large number of or-
ganizations and thus will take time to get deployed. Meanwhile prefix hijack incidents
occur from time to time and our work provides an assessment of the potential im-
pacts of these incidents. Several hijack detection systems have also been developed,
114
for example MyASN[37] and PHAS[24]. However since these systems are reactive in
nature, it is still important for network customers to understand the relations between
their networks’ topological connectivity and the potential vulnerability in face of prefix
hijacks.
Our simulation and analysis show that AS nodes with large node-degrees (e.g.,
tier-1 networks) are not the most resilient against hijacks of their own prefixes. An AS
can gain high resiliency against prefix hijacks by being direct or indirect customers
of multiple tier-1 providers with the shortest possible AS paths. Conversely, such
customer AS nodes can also make the most impact over the entire Internet, if they
inject false routes into the Internet. This finding suggests that securing the routing
announcements from the major ISPs alone is not effective in curbing a high impact
attack, and that it is even more important to watch the announcements from lower-tier
networks with good topological connectivity.
On the other hand, customer networks that are far away from their indirect tier-1
providers can be greatly affected if their prefixes get hijacked. These topologically
disadvantaged AS nodes are in the most need for investigating other means to pro-
tect themselves. Subscribing to prefix hijack detection systems, such as MyASN and
PHAS, would be helpful. To reduce the transient impact during the detection delay,
one may also look into another proposed solution called PGBGP [19], which is briefly
described in Section 8.
Note that the topological connectivity required for resiliency against prefix hijacks
is different from that required for fast routing convergence [32]. Fast convergence
benefits from fewer alternative paths when the routes change, thus prefixes announced
by tier-1 providers meet the requirement well; while hijack resiliency benefits from
being a direct or indirect customer of a large number of tier-1 providers, thus prefixes
are better hosted by well connected non-tier-1 AS nodes.
115
We would like to end this discussion by stressing the importance of understand-
ing prefix hijack impacts, even when the protection mechanisms are put in place. Our
evaluations on an Internet scale topology in Section 7.2 used a no-valley prefer cus-
tomer routing policy and showed that tier-1 AS nodes are not very resilient to hijacks
of their own prefixes since other tier-1 AS nodes prefer customer routes to false origin.
However, in reality a tier-1 AS may use various mechanisms, such as Internet Routing
Registries (IRR), to check the origin of a prefix before forwarding the route. Such
mechanisms would probably boost the resiliency of tier-1 AS nodes being hijacked.
On the other hand, these protection mechanisms can also fail or backfire, thus expos-
ing the vulnerability of a network. As we saw in case I of Section 7.3, most of the
hijacked prefixes were the former customers of the false origin AS and were recorded
in the Internet Routing Registry (IRR), which was not updated. Outdated registries
resulted in false routes being propagated to the rest of the Internet.
Another example of a protection mechanism is the maximum prefix filter in BGP
that allows an AS to configure the maximum number of routes received from a neigh-
bor. Thus, by limiting the total number of routes received from a neighbor, an AS
can limit the damage in case of the neighbor announcing false routes. In case II from
Section 7.3, AS-9121 announced over 100,000 false routes and one of its neighbors,
AS-1299, had a max prefix set to a relatively low value. AS-1299 believed only 1849
routes directly from AS-9121, but since the max prefix limit is per neighbor, AS-
1299 received hijacked routes from other neighbors as well. It learned a total of over
100,000 bad routes from all the neighbors combined, thus infecting a major portion
of its routing table [34]. These examples show how easily protection mechanisms can
fail due to human errors, underlining the need to understand the impact of hijacks in
face of protection failures, and the need to protect networks by multiple means such as
PGBGP and PHAS.
116
CHAPTER 8
Related Works
8.1 Visualization of Route Dynamics
In the area of visual analysis of Internet routing, BGPlay [5] shows changes in routes
from different monitors to a particular prefix. BGPlay visualizes an update stream and
uses animation to highlight the change in routes. A tool closely related to BGPlay
is ELISHA [46] [47]. This work has a similar flavor to BGPlay and analyzes events
on a per prefix basis. In this scheme, updates on a prefix are sequentially arranged
next to a time line and a line is drawn from the time line to the updates. This helps
in easily identifying effect of the updates clustered close together. One can then delve
into the details of a particular event, by visualizing the path changes in the form of
an arc based representation of links in the routing paths with each AS being assigned
a unique X coordinate. This visualization can help in understanding the updates as
well as detecting routing anomalies. Both BGPlay and ELISHA complement Link-
Rank. Note that BGPlay and ELISHA capture events to a particular destination, while
Link-Rank visualizes aggregate routing changes affecting multiple prefixes. Thus, on
detecting routing problems to a specific prefix using BGPlay or ELISHA, one can use
Link-Rank to see if the problem is related to some link level issues and vice versa.
Other closely related work to Link-Rank is detecting prefix hijacks using visual-
ization [45]. The main difference lies in the fact that [45] provides a visual technique
for detecting abnormality in prefix announcements but does not tell which ASes get
117
affected as shown in detail by Link-Rank. Interesting events exposed by visualization
in [45] could be investigated using Link-Rank to understand the event impact.
8.2 Understanding Routing Dynamics and Problem Inference
Previous work in root cause inference in Internet routing can be broadly sorted into
three categories: automated root cause analysis[21, 10, 7, 49, 6, 39, 51, 17, 43], visu-
alization and human interaction based approaches [5, 26], and theoretical schemes in
model settings[27].
In one of the seminal works about network instability, Labovitz et. al.[23] identi-
fies several causes of routing instabilities in the Internet, without however diagnosing
their topological origin. Later efforts [7, 6, 10, 49] analyze BGP updates and perform
aggregation along three dimensions: time, monitors and prefixes, achieving a final
output in the form of candidate sets of instability origins. Our Min-cut scheme does a
similar aggregation, except that we use link weight aggregates and focus on analyzing
the BGP logs from the viewpoint of a specific monitor, augmented by some views of
other monitors. [21] uses a different approach, since it identifies layers of links with
shared risk and uses membership information to isolate with accuracy failures in the
optical hardware of a network backbone. Feldmann et al.[10] propose a root cause
inference system that aggregates BGP updates according to time, monitors and pre-
fixes (by this order) and uses a greedy heuristic to identify the origin of change. Other
class of works such as [39, 51, 17] diagnose routing changes using anomaly detec-
tion techniques. Roughan et. al.[39] use an EWMA(exponential weighted moving
average) technique in BGP data, and decomposition/Holt-Winters methods in SNMP
data, showing that by increasing the number of monitors, the number of false alarms is
also decreased. [51] uses PCA (principal component analysis) techniques to correlate
different updates into clusters, each cluster being a set of prefixes or ASes which are
118
affected by the same event. Huang et. al.[17] uses the same PCA technique to detect
anomalies inside Abilene network, using multiple Abilene BGP views and routers’
configurations as input. Teixeira et. al.[43] describe a framework to detect the cause
of a routing change using a coordinated diagnostic mechanism among several ISPs,
requiring a special server in each ISP that replies to diagnose queries from other do-
mains. In contrast, our scheme only requires the view from the own ISP and publicly
available views from other ISPs that might be involved in the event.
8.3 Prefix Hijack
Various prefix hijack events have been reported to NANOG [30] mailing list from time
to time. [55] and [29] studied the exact prefix hijack as part of the MOAS (Multiple
Origin AS) problem, in which one prefix has multiple origin ASes in the routing table.
These studies show that one prefix can be legitimately announced by multiple origin
ASes, but can also be hijacked due to mis-configurations.
Existing proposals to address prefix hijack problem can be categorized into two
types: cryptography based, and non-crypto based. Crypto-based solutions, such as
[42], [3],[16], [31], [20], [41], require BGP routers to sign and verify the origin AS
and the path, which have significant impact on router performance. Furthermore, these
solutions are not easily deployable because they all need changes to router software,
and some require public key infrastructures.
Non-crypto proposals include [13], [48], [56], and [19]. IRV approach in [13] lets
each AS designate a server that answers queries regarding BGP security. [48] lets
the router give preference to stable routes over transient ones which can be results of
prefix hijacks. Similarly, in PGBGP [19], a router detects prefix hijacks by monitoring
the origin ASes in BGP announcements for each prefix over time. A transient origin
119
AS of a prefix is considered as anomalous, and router avoids using the anomalous
routes whenever possible. PG-BGP also detects covered prefix hijacks using similar
approach. In [56], prefix owners attach additional information to the routing updates,
so that remote routers could detect prefix hijacks. All the Above non-crypto proposals
require changes to router softwares, router configurations, or the ways that operators
run their networks.
Compared to all of the above proposals, the biggest advantage of our system is
that it is fully deployable. PHAS can be up and running without requiring cooperation
from multiple ISPs, registry authorities, router vendors, or even end users. While
other approaches focus on detecting prefix hijack at remote ASes, we simply notify
the prefix owner about the origin changes, thus allowing the prefix owners to detect
prefix hijacks with a high accuracy.
Three other related works [44, 22, 37] are also fully deployable. [44] utilizes the
data from RouteViews or RIPE and visualizes the origin AS changes of the prefixes
for visual detection of the prefix hijacks. [22] proposes an alarming algorithm for
prefix hijacks and path hijack, based on the the public BGP data, and the geographic
information of the each AS from the whois database. The key observation is that if
two edge ASes are connected to each other or legitimately originate the same prefixes,
they are geographically close. Violation of this observation will trigger alarms.
The RIPE MyASN project [37] is probably the most similar service to ours, but
its design is based on a fundamentally different philosophy. In the MyASN project,
a prefix owner registers the valid origin set for a prefix. MyASN then tracks roughly
the equivalent of our instantaneous origin set for this prefix. An alarm is triggered
when any invalid origin AS appears. Our approach reports the origin set changes to
the prefix owner, and any filtering or checking is done at the user site. This is a subtle
difference, but has important implications.
120
First, filtering at the user side provides the greatest degree of flexibility to the de-
tection algorithm. Users can apply any filtering criteria or detection algorithm on the
data. When the filtering is done at the service site like MyASN, it is limited to what the
service interface could provide. Obviously for security reason, the service site cannot
allow arbitrary filtering script to be uploaded. If prefix owners cannot achieve their
filtering goal at the service site, they have to deploy local filter anyway.
Second, it is critical for the server-based filtering to have the most up-to-date in-
formation needed for prefix hijack detection. The valid origin set must be updated at
MyASN server whenever the prefix has a different origin set. It’s especially hard to
do update in face of an on-going prefix hijack. When a new hijack happens, the prefix
owner may want to change the filtering rule, but is unable to do so due to the attack.
Our approach does not does not suffer from this problem.
In terms of understanding hijacks [4] discusses hijacking and a related concept
of interception where an AS can transparently intercept hijacked traffic and forward
it to the owner. Their study shows that ASes higher in the hierarchy can intercept
a high amount of traffic. This does not directly contradict our result that tier-1 and
other big ISPs are not very resilient, since their estimations are based on which ASes
forward high amount of traffic, while our estimates are based on which routes are more
preferred. Another related measurement is [35] where they show how super prefix
hijacks are correlated with spam sent from the hijacked address space.
121
CHAPTER 9
Conclusions
Due to the sheer size of the global routing infrastructure as well as its dense connectiv-
ity and the resulting complex interactions among the large number of interconnected
networks, understanding global routing changes and inferring the origin of changes
presents a great research challenge.
In this work we proposed to weigh links by number of BGP routes carried and
analyzed Internet routing from the perspective of link weight changes. This approach
is fundamentally different from prior work dealing with individual routes and provides
an aggregate metric to understand routing changes that are related. This enables us
to visualize large scale routing changes by observing the links with large amount of
routing changes. We can also correlate link weight changes and understand the pos-
sible cause of the routing changes. We also proposed a heuristic based on min-cut to
infer most likely location of the changes and showed that this heuristic can achieve a
high degree of accuracy. We also characterize expected weight on each link and use
this information to identify events on links as well as distinguish cases where links fail
versus where links increase routes as backup. Our results show that the use of link
weights and changes can be a promising direction towards routing problem diagnosis
in large scale networks.
Prefix Hijacking is another important problem that the Internet faces. We show that
a simple prefix hijack alert system can effectively reduce the time between initiation
of an attack and the knowledge of this attack. We also showed that the Internet routing
122
policy that was used to commercially benefit big tier-1 ISPs is one of the main factors
that reduces tier-1 ISPs resiliency against prefix hijacking. Our study shows that big
is not always better, and that high impact attacks can be launched from corners of the
Internet. Thus its not enough to secure a few big ISPs to achieve strong security in the
Internet.
9.1 Future Works
We now present some ongoing and future work in the area of Internet routing that can
build on the results from our work.
9.1.1 Understanding AS activity
The stability of Internet routing has been analyzed from the point of view of prefixes
and the AS originating the prefix. However, an AS providing transit for prefixes origi-
nated by others can also have unstable links or internal problems and hence contribute
to routing dynamics. In order to understand the overall stability of Internet routing,
we need to abstract this role of transit AS in routing dynamics. For this purpose, we
propose to use a measure of AS rank-change, an extension of link weight change indi-
cating the amount of path changes an AS is involved in. With this metric, we obtain
information containing the AS rank-change for each AS as observed from all monitors
over time. Such a data set is multi-dimensional and we propose to use a dimension-
ality reduction technique like Principal Component Analysis (PCA) to understand the
data better. By this technique we expect to clearly separate outliers contributing more
routing updates from normal cases and understand the routing dynamics as a whole.
123
9.1.2 Prefix hijack
In Chapter 6 we discussed hijack detection and building an alert system. One of the
problems facing the Internet today is quick recovery from a hijack event. The typical
procedure for recovering from a prefix hijack involves operators from the affected AS
contacting the hijacker AS or its upstream providers and requesting that announce-
ments be stopped. However, this process can take some time, and in the mean time a
hijack can cause a severe impact. One possible direction of work we are investigating
involves building a reaction system wherein certain measures can be taken to quickly
flush out the hijacked routes from the Internet. Such a reaction system can reduce
the negative impact on data delivery while giving network operators enough time to
contact the source of the problem.
In Chapter 7 we showed how routing policies can negatively impact the resiliency
tier-1 ISPs. These ISPs are well connected and can provide superior data-path per-
formance and hence it is important to fix the weak resiliency due to policy decisions.
One line of work involves designing such a fix, so that the commercial interests are not
compromised, yet resiliency against prefix hijacking can increase.
9.1.3 BGP monitoring
A lot of recent research in Internet routing has benefitted greatly from BGP data col-
lected from monitoring projects like RouteViews and RIPE. However, with over 400
BGP monitors and with each monitor collecting close to 400,000 updates per day on
average, this translates to a lot of monitoring data. Researchers often pick a small num-
ber of monitors to evaluate or understand routing problems, but the choice of which
monitors to pick is often done randomly. An interesting line of work involves using
some aggregate metric like link weight changes to cluster monitors so that one can
124
clearly identify monitors that collect similar data. With an understanding of monitor
behavior, one may be able to systematically pick a smaller set of monitors that is more
representative of overall Internet behavior.
125
REFERENCES
[1] North American Network Operators Group (NANOG). [On-line]http://www.nanog.org.
[2] Secure Inter-Domain Routing (SIDR) Working Group.http://www1.ietf.org/html.charters/sidr-charter.html.
[3] W. Aiello, J. Ioannidis, and P. McDaniel. Origin Authentication in InterdomainRouting. In Proceedings of 10th ACM Conference on Computer and Communi-cations Security, pages 165–178. ACM, October 2003. Washington, DC.
[4] Hiitesh Ballani, Paul Francis, and Xinyang Zhang. A Study of Prefix Hijackingand Interception in the Internet . In Proceedings of ACM Sigcomm, 2007.
[5] Giuseppe Di Battista, Federico Mariani, Maurizio Patrignani, and Maurizio Piz-zonia. BGPlay: A system for visualizing the interdomain routing evolution. InGraph Drawing, volume 2912 of Lecture Notes Computer Science, pages 295–306, 2003.
[6] M. Caesar, L. Subramanian, and R. Katz. Root cause analysis of internet routingdynamics. In U.C. Berkeley Technical Report UCB/CSD-04-1302, nov 2003.
[7] D. Chang, R. Govindan, and J. Hiedemann. The temporal and topological char-acterestics of bgp path changes. In ICNP, nov 2003.
[8] E. Dahlhause, D. S. Johnson, C. H. Papadimitriou, P. D. Seymore, and M. Yan-nakakis. The complexity of multiway cuts. In Proceedings of the 24th AnnualACM Symposium on Theory of Computing, pages 241–251, 1992.
[9] S. Sangli E. Chen. Avoid BGP Best Path Transitions from One External to An-other. Internet Draft, IETF, June 2006. http://www.ietf.org/internet-drafts/draft-ietf-idr-avoid-transition-04.txt.
[10] A. FeldMann, Olaf Maennel, Z. Morley Mao, A. Berger, and B. Maggs. LocatingInternet routing instabilities. In Proceedings of Sigcomm, September 2004.
[11] Lixin Gao. On inferring autonomous system relationships in the Internet.ACM/IEEE Transactions on Networking, 9(6):733–745, 2001.
[12] N. Garg, V. Vazirani, and M. Yannakakis. Multiway cuts in node weightedgraphs. Journal of Algorithms, 50(1):49–61, 2004.
126
[13] G. Goodell, W. Aiello, T. Griffin, J. Ioannidis, P. McDaniel, and A. Rubin. Work-ing around BGP: An incremental approach to improving security and accuracyof interdomain routing. In NDSS, 2003.
[14] Timothy Griffin and Gordon T. Wilfong. A safe path vector protocol. In INFO-COM (2), pages 490–499, 2000.
[15] Timothy G. Griffin and Gordon T. Wilfong. An analysis of BGP convergenceproperties. In Proceedings of SIGCOMM, pages 277–288, Cambridge, MA, Au-gust 1999.
[16] Y.-C. Hu, A. Perrig, and M. Sirbu. SPV: Secure path vector routing for securingbgp. In Proceedings of ACM Sigcomm, August 2004.
[17] Y. Huang, N. Feamster, A. Lakhina, and J. Xu. Detecting Network Disruptionswith Network-Wide Analysis . In Proc. of ACM SIGMETRICS, 2007.
[18] Geoff Huston. Auto-detecting hijacked prefixes? a presentation at RIPE-50,May 2005 http://www.potaroo.net/presentations/index.html.
[19] J. Karlin, S. Forrest, and J. Rexford. Pretty good bgp: Protecting bgp by cau-tiously selecting routes. Technical Report TR-CS-2005-37, University of NewMexico, Octber 2005.
[20] S. Kent, C. Lynn, and K. Seo. Secure Border Gateway Protocol. IEEE Journalof Selected Areas in Communications, 18(4), April 2000.
[21] Ramana Rao Kompella, Jennifer Yates, Albert Greenberg, and Alex Snoeren. IPfault localization via risk modeling. In Proceedings of Second ACM/USENIXSymposium on Networked Systems Design and Implementation, 2005.
[22] Christopher Kruegel, Darren Mutz, William Robertson, and Fredrik Valeur.Topology-based detection of anomalous bgp messages. In 6th Symposium onRecent Advances in Intrusion Detection (RAID), 2003.
[23] C. Labovitz, G. R. Malan, and F. Jahanian. Origins of internet routing instability.In Proceedings of the IEEE INFOCOM ’99, pages 218–26, New York, NY, March1999.
[24] Mohit Lad, Dan Massey, Dan Pei, Yiguo Wu, Beichuan Zhang, and Lixia Zhang.PHAS: A prefix hijack alert system. In 15th USENIX Security Symposium, 2006.
[25] Mohit Lad, Dan Massey, and Lixia Zhang. Link-Rank: A Graphical Tool forcapturing BGP Routing Dynamics. In IEEE/IPIF NOMS, April 2004.
127
[26] Mohit Lad, Daniel Massey, and Lixia Zhang. Visualizing Internet routingchanges. In IEEE Transactions on visualization and Computer Graphics, spe-cial issue on visual analytics, to appear, 2006.
[27] Mohit Lad, Akash Nanavati, Dan Massey, and Lixia Zhang. An AlgorithmicApproach to Identifying Link Failures. In PRDC, March 2004.
[28] Joshua Madaadhain, Danyel Fisher, Padhraic Smyth, Scott White, and Yan-BiaoBoey. Analaysis and visualization of network data using JUNG. In Journal ofStatistical Software, to appear.
[29] R. Mahajan, D. Wetherall, and T. Anderson. Understanding BGP misconfigura-tion. In Proceedings of ACM Sigcomm, August 2002.
[30] The NANOG Mailing List. http://www.merit.edu/mail.archives/nanog/.
[31] J. Ng. Extensions to BGP to Support Secure Origin BGP. ftp://ftp-eng.cisco.com/sobgp/drafts/draft-ng-sobgp-bgp-extensions-02.txt, April 2004.
[32] Ricardo Oliviera, Beichuan Zhang, Dan Pei, Rafit Itzak-Ratzin, and Lixia Zhang.Quantifying path exploration in the Internet. In Proceedings of Internet Mea-surement Conference, to appear, 2006.
[33] D. Pei, D. Massey, and L. Zhang. A Framework for Resilient Internet RoutingProtocols. IEEE Network Special Issue on Protection, Restoration, and DisasterRecovery, 2004.
[34] Alin C. Popescu, Brian J. Premore, and Todd Underwood. Anatomy of a leak:As9121. http://www.nanog.org/mtg-0505/underwood.html.
[35] Anirudh Ramachandran and Nick Feamster. Understanding the network-levelbehavior of spammers. In Proceedings of ACM SIGCOMMq, 2006.
[36] Y. Rekhter and T. Li. A border gateway protocol (BGP-4). Request for Comment(RFC): 4271, 2006.
[37] RIPE. Routing information service: myASn System.http://www.ris.ripe.net/myasn.html.
[38] RIPE. Routing Information Service Project. http://www.ripe.net/ripencc/pub-services/np/ris-index.html.
[39] Matthew Roughan, Timothy G. Griffin, Z. Morley Mao, Albert Greenberg, andBrian Freeman. Forwarding anamolies and improving their detection using mul-tiple data sources. In SIGCOMM Workshop: Network Troubleshooting, 2004.
128
[40] routeviews.org. RouteViews Routing Table Archive.http://www.routeviews.org/.
[41] B. R. Smith and J. J. Garcia-Luna-Aceves. Securing the border gateway routingprotocol. In Global Internet’96, November 1996.
[42] L. Subramanian, V. Roth, I. Stoica, S. Shenker, and R. H. Katz. Listen andwhisper: Security mechanisms for bgp. In Proceedings of ACM NDSI 2004,March 2004.
[43] Renata Teixeira and Jennifer Rexford. A measurement framework for pin-pointing routing changes. In Proceedings of the ACM SIGCOMM workshop onNetwork troubleshooting, 2004.
[44] S. T. Teoh, K.-L. Ma, S. F.Wu, D. Massey, X. Zhao, D. Pei, L. Wang,L. Zhang, and R. Bush. Visual-based anomaly detection for bgp origin as change(oasc) events. In IFIP/IEEE DistributedSystems: Operations and Management(DSOM), pages 155–168, October 2003.
[45] Soon Tee Teoh and Kwan-Liu Ma. Case study: Interactive visualization for In-ternet security. In Proceedings of IEEE Visualization, 2002.
[46] Soon Tee Teoh, Kwan-Liu Ma, and S. Felix Wu. A Visual Exploration Processfor the Analysis of Internet Routing Data. In Proc. IEEE Visualization, 2003.
[47] Soon Tee Teoh, Ke Zhang, Shih-Ming Tseng, Kwan-Liu Ma, and S. Felix Wu.Combining visual and automated data mining for near-real-time anomaly detec-tion and analysis in BGP. In VizSEC/DMSEC ’04: Proceedings of the 2004 ACMworkshop on Visualization and data mining for computer security, pages 35–44,2004.
[48] L. Wang, X. Zhao, D. Pei, R. Bush, D. Massey, A. Mankin, S. Wu, and L. Zhang.Protecting BGP Routes to Top Level DNS Servers. In Proceedings of the ICDCS2003, 2003.
[49] Jian Wu, Z. Morley Mao, and Jennifer Rexford. Finding a needle in a haystack:Pinpointing significant BGP routing changes in an IP network. In Proceedingsof 2nd symposium on Networked Systems Design and Implementation (NSDI),2005.
[50] Jianhong Xia and Lixin Gao. On the evaluation of AS relationship inferences. InProc. of IEEE GLOBECOM, December 2004.
129
[51] Kuai Xu, Jaideep Chandrashekar, and Zhi-Li Zhang. A First Step Towards Un-derstanding Inter-domain Routing. In Proc. of ACM SIGCOMM Workshop onMining Network Data, 2005.
[52] Wen Xu and Jennifer Rexford. MIRO: multi-path interdomain routing. In SIG-COMM, pages 171–182, 2006.
[53] Beichuan Zhang, Raymond Liu, Dan Massey, and Lixia Zhang. Internet Topol-ogy Project. http://irl.cs.ucla.edu/topology/.
[54] Beichuan Zhang, Raymond Liu, Daniel Massey, and Lixia Zhang. Collecting theinternet as-level topology. ACM SIGCOMM Computer Communications Review(CCR), 35(1):53–62, January 2005.
[55] X. Zhao, D. Pei, L. Wang, D. Massey, A. Mankin, S. Wu, and L. Zhang. AnAnalysis BGP Multiple Origin AS(MOAS) Conflicts. In Proceedings of the ACMIMW2001, Oct 2001.
[56] X. Zhao, D. Pei, L. Wang, D. Massey, A. Mankin, S. Wu, and L. Zhang. De-tection of Invalid Routing Announcement in the Internet. In Proceedings of theIEEE DSN 2002, June 2002.
130