understanding and diagnosing routing dynamics in global ... · understanding and diagnosing routing...

UNIVERSITY OF CALIFORNIA

Los Angeles

Understanding and Diagnosing Routing Dynamics inGlobal Internet

A dissertation submitted in partial satisfaction

of the requirements for the degree

Doctor of Philosophy in Computer Science

by

Mohit Vijay Lad

2007

The dissertation of Mohit Vijay Lad is approved.

Mark Hansen

Dan Massey

Adam Meyerson

Songwu Lu

Rafail Ostrovsky

Lixia Zhang, Committee Chair

University of California, Los Angeles

2007

ii

To my Parents . . .

who gave meaning to my life and

among so many other things,

taught me to be patient

iii

TABLE OF CONTENTS

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Internet Routing and Border Gateway Protocol . . . . . . . . . . . . 1

1.2 BGP monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Problems and Scope of Thesis . . . . . . . . . . . . . . . . . . . . . 3

1.3.1 Inferring origin of routing changes . . . . . . . . . . . . . . . 4

1.3.2 Understanding routing changes in the Internet . . . . . . . . . 5

1.3.3 Security Problems . . . . . . . . . . . . . . . . . . . . . . . 6

2 Inferring Failures in Path Vector Routing . . . . . . . . . . . . . . . . . 9

2.1 Model and Problem Definition . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 Network and Routing Model . . . . . . . . . . . . . . . . . . 11

2.1.2 Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.3 Input Requirements . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Minimum e-set for tree inputs . . . . . . . . . . . . . . . . . . . . . 13

2.3 Minimum e-set for general graphs . . . . . . . . . . . . . . . . . . . 16

2.3.1 Problem Definition: Graph Version . . . . . . . . . . . . . . 17

2.3.2 The case of no purely transit nodes, VN = φ or V = VD . . . 18

2.3.3 The case of purely transit nodes, VN 6= φ . . . . . . . . . . . 21

3 Identifying Routing Problems in the Internet . . . . . . . . . . . . . . . 24

3.1 Challenges in diagnosing Internet routing problems . . . . . . . . . . 24

3.2 Capturing aggregate behavior: Notion of Link-weight . . . . . . . . . 25

iv

4 Visualizing Internet Routing Dynamics using Link-Rank . . . . . . . . 29

4.1 Components of Link-Rank . . . . . . . . . . . . . . . . . . . . . . . 30

4.2 Features of Link-Rank . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.1 Nodes, edges and color coding . . . . . . . . . . . . . . . . . 32

4.2.2 Activity plots: summarizing weight changes . . . . . . . . . . 33

4.2.3 Time Windows and Drilling Down . . . . . . . . . . . . . . . 34

4.2.4 Pruning Rank-change graphs . . . . . . . . . . . . . . . . . . 37

4.2.5 Assembled View: Merging Rank-change graphs from multiple

observation points . . . . . . . . . . . . . . . . . . . . . . . 38

4.3 Discovery and analysis using Link-Rank . . . . . . . . . . . . . . . . 38

4.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3.2 Case I: Capturing Link Instabilities . . . . . . . . . . . . . . 40

4.3.3 Case II: Root-cause identification . . . . . . . . . . . . . . . 42

5 Inferring Origin of Internet Routing Problems . . . . . . . . . . . . . . 45

5.1 Characterizing links and identifying routing events . . . . . . . . . . 46

5.1.1 Link Events . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2 The Inference Scheme . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.2.1 Overview of approach . . . . . . . . . . . . . . . . . . . . . 52

5.2.2 Fault graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2.3 Augmenting fault graph with views from additional observa-

tion points . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.2.4 Candidate set reduction . . . . . . . . . . . . . . . . . . . . . 57

5.2.5 Identifying node problems . . . . . . . . . . . . . . . . . . . 58

v

5.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.3.1 Validation using Abilene Data . . . . . . . . . . . . . . . . . 59

5.3.2 Validation using Origin-adjacent events . . . . . . . . . . . . 60

5.3.3 Application to BGP data . . . . . . . . . . . . . . . . . . . . 63

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6 Detecting and Alerting about Prefix Hijacks . . . . . . . . . . . . . . . 68

6.1 Prefix Hijack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.3 Origin Change Detection . . . . . . . . . . . . . . . . . . . . . . . . 76

6.3.1 Instantaneous Origin Changes . . . . . . . . . . . . . . . . . 77

6.3.2 Windowed Origin Changes . . . . . . . . . . . . . . . . . . . 79

6.3.3 Adaptive Window Size . . . . . . . . . . . . . . . . . . . . . 80

6.4 Notification Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.5 Local Notification Filter . . . . . . . . . . . . . . . . . . . . . . . . 85

6.5.1 Constructing filtering rules . . . . . . . . . . . . . . . . . . . 86

6.5.2 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.6.1 Notification Messages . . . . . . . . . . . . . . . . . . . . . 89

6.6.2 Detecting Known Events . . . . . . . . . . . . . . . . . . . . 91

6.6.3 Notification Delivery . . . . . . . . . . . . . . . . . . . . . . 92

6.7 Extensions to basic system . . . . . . . . . . . . . . . . . . . . . . . 94

6.7.1 Classification of Prefix hijack . . . . . . . . . . . . . . . . . 95

vi

6.7.2 Sub-prefix Set . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.7.3 Last Hop Set . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7 Understanding Resiliency of the Internet against Prefix Hijack . . . . . 98

7.1 Hijack Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 99

7.2 Evaluating Hijacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

7.2.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . 103

7.2.2 Characterizing Topological Resilience . . . . . . . . . . . . . 104

7.2.3 Factors Affecting Resilience . . . . . . . . . . . . . . . . . . 105

7.3 Prefix Hijack Incidents in the Internet . . . . . . . . . . . . . . . . . 108

7.3.1 Case I: Prefix Hijacks by AS-27506 . . . . . . . . . . . . . . 108

7.3.2 Case II: Prefix Hijacks by AS-9121 . . . . . . . . . . . . . . 111

7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

8 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

8.1 Visualization of Route Dynamics . . . . . . . . . . . . . . . . . . . . 117

8.2 Understanding Routing Dynamics and Problem Inference . . . . . . . 118

8.3 Prefix Hijack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

9.1 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

9.1.1 Understanding AS activity . . . . . . . . . . . . . . . . . . . 123

9.1.2 Prefix hijack . . . . . . . . . . . . . . . . . . . . . . . . . . 124

9.1.3 BGP monitoring . . . . . . . . . . . . . . . . . . . . . . . . 124

vii

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

viii

LIST OF FIGURES

1.1 Internet routing and BGP monitoring . . . . . . . . . . . . . . . . . . 4

2.1 Possible e-sets are {(1, 2)} and {(2, 4), (2, 5), (2, 6)} . . . . . . . . . 10

2.2 Example for nearest descendent and minimum e-set from Algorithm 2 14

2.3 greedy Fault() with each node as a destination . . . . . . . . . . . . . 20

2.4 Example showing greedy not optimal in general case . . . . . . . . . 21

3.1 The notion of link weight . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2 Rank-change graph for change in Figure 3.1 . . . . . . . . . . . . . . 27

4.1 Components of Link-Rank . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Sample Rank-change graph . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 Plotting an activity bar . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.4 Use of time window to control time of change . . . . . . . . . . . . . 36

4.5 Drilling down to increase level of detail in activity . . . . . . . . . . . 37

4.6 Assembling views from AS 11608 and AS 3561 . . . . . . . . . . . . 39

4.7 Activity plots from March 8, 2005 to March 14, 2005 . . . . . . . . . 41

4.8 One hour of activity plot from 12.0.1.63 on March 9, 2005 . . . . . . 41

4.9 Case I: Continuous switching of routes between two links . . . . . . . 42

4.10 Activity plots from October 18, 2005 to October 24, 2005 . . . . . . . 44

4.11 Case II: Instability observed at AS 6453 . . . . . . . . . . . . . . . . 44

4.12 Case II: Combined view from AS 1239, AS 6453 and AS 3257 . . . . 44

ix

5.1 Frequency distribution of link weight values of 3 links from a single

observation point . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.2 Percentage samples covered by the top 5 most common values see in

the 4500 link weight samples for each link . . . . . . . . . . . . . . . 49

5.3 Example showing choice of α = 0.1 . . . . . . . . . . . . . . . . . . 50

5.4 Events in January 2007 . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.5 Main steps in inference . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.6 A fault graph from single observation point . . . . . . . . . . . . . . 55

5.7 Augmenting a fault graph with additional information . . . . . . . . . 55

5.8 State transition diagram . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.9 BGP peer TENET (AS 2018) of Abilene (AS 11537) was unreachable.

Event observed from primary view of AS 11686. . . . . . . . . . . . 60

5.10 Number of origin-adjacent events affecting each observation point . . 62

5.11 Accuracy of Fsingle and Fmult for origin events involving more than 50

prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.12 Links involved in events per observation point . . . . . . . . . . . . . 64

5.13 Cumulative Distribution of instances per link from AS 2914 . . . . . 65

5.14 Repeated instability involving AS 2072 as viewed from AS 2914. . . . 65

5.15 Case study: Routing changes seen from AS 2914 . . . . . . . . . . . 66

6.1 Example of prefix hijack . . . . . . . . . . . . . . . . . . . . . . . . 71

6.2 Components of PHAS . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.3 Origin events per prefix - December 2005 . . . . . . . . . . . . . . . 78

6.4 Inter-arrival time between origin events for a prefix for December 2005 79

x

6.5 Distribution of origin events per prefix using adaptive window . . . . 82

6.6 Comparison of origin events per day using instantaneous and adaptive

window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.7 Notification setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.8 Origin events per day from June 1, 2005 to August 31, 2005 . . . . . 89

6.9 Origin events per day from September 1, 2005 to November 30, 2005 90

6.10 Distribution of events per AS for December 2005 . . . . . . . . . . . 91

6.11 Delivery Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.12 Delivery Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.13 Types of prefix hijacks . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.1 Hijack scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.2 Distribution of node resilience. . . . . . . . . . . . . . . . . . . . . . 104

7.3 Resilience of nodes in different tiers. . . . . . . . . . . . . . . . . . . 105

7.4 Understanding resilience of tier-1 nodes . . . . . . . . . . . . . . . . 107

7.5 Resilience of nodes with different number of Tier-1 providers. . . . . 108

7.6 High resiliency against hijack . . . . . . . . . . . . . . . . . . . . . . 110

7.7 Low resiliency against hijack . . . . . . . . . . . . . . . . . . . . . . 111

7.8 Tier-1 prefix hijacked . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.9 Multi-homed customer of tier-1s hijacked . . . . . . . . . . . . . . . 114

xi

ACKNOWLEDGMENTS

First and foremost, I would like to acknowledge my dissertation advisor Dr. Lixia

Zhang who has constantly guided me through my dissertation. I would also like to

acknowledge Dr. Dan Massey for his guidance and support during my dissertation,

and Dr. Songwu Lu for his feedback on my work from time to time. I am also grateful

to Dr. Adam Meyerson for contributing towards the work on minimum failures in

routing. I would like to extend a special note of thanks to Verra Morgan for her time

and support during my Ph.D. Finally, various friends and colleagues have played an

important role during my Ph.D., notable among them are Dr. Akash Nanavati, Dr. Dan

Pei, Dr. Beichuan Zhang, Dr. Vasilis Pappas, Ricardo Oliveira, Eric Osterweil, Yang

Yi, Soshant Bali, and Yuan-Chin Amy Lee.

My research has been complemented with deployed systems and this would not

have been possible without the work by fellow students. I would like to thank all the

students who have contributed to the Link-Rank visualization tool. Tim Ma worked on

the very first version of Link-Rank and was also responsible for the web based services.

Yiguo Wu carried on the work and added more features to Link-Rank. Jeffrey Chiang

and Jonathan Salehpour helped in redesigning the tool and were instrumental in the

release of version 1.0. I would also like to thank Yan Chen for converting a prototype

of PHAS into full fledged deployed system.

Finally, I would like to acknowledge NSF and DARPA for their research grants

under which I was supported.

xii

VITA

1978 Born, Mumbai, India.

2000 B.E. (Computer Engineering), Mumbai University, India.

2000–2001 Research Assistant Programmer, Computer Science and Engineer-

ing Department, Indian Institute of Technology, Mumbai.

2001–2003 Teaching Assistant, Computer Science Department, UCLA. Taught

sections of upper division undergraduate courses in Software Engi-

neering, Database Systems and Computer Networks

2002,2003 Visiting Research Assistant, USC Information Sciences Institute,

Arlington, VA, USA.

2003 M.S. (Computer Science), UCLA.

2003-present Research Assistant, Computer Science Department, UCLA.

PUBLICATIONS

Mohit Lad, Xiaoliang Zhao, Beichuan Zhang, Dan Massey and Lixia Zhang, An Anal-

ysis of BGP Update Burst during Slammer Attack , in Proceedings of the 5th Interna-

tional Workshop on Distributed Computing (IWDC), Dec 2003.

—, Akash Nanavati, Dan Massey and Lixia Zhang, An Algorithmic Approach to Iden-

xiii

tifying Link Failures , in Proceedings of the 10th Pacific Rim International Symposium

on Dependable Computing (PRDC). March 2004.

—, Dan Massey, Adam Meyerson, Akash Nanavati and Lixia Zhang, Minimum failure

explanations for path vector routing, in Journal of Combinatorial Optimization, special

issue on Communication Networks and Internet Applications, September 2006.

—, Dan Massey, Dan Pei, Yiguo Wu, Beichuan Zhang and Lixia Zhang, PHAS:

A prefix hijack alert system, in Proceedings of 15th USENIX Security Symposium

(USENIX Security 2006)

—, Dan Massey and Lixia Zhang. Visualizing Internet Routing Dynamics. IEEE

Transactions on Visualization and Computer Graphics, Nov/Dec 2006.

—, Ricardo Oliveira, Dan Massey and Lixia Zhang, A link weight based approach for

root cause inference in Internet Routing , UCLA Technical Report 060028, January

2007

—, Ricardo Oliveira, Beichuan Zhang and Lixia Zhang , Understanding Resiliency of

Internet Topology Against Prefix Hijack Attacks, to appear in IEEE/IFIP DSN 2007

xiv

ABSTRACT OF THE DISSERTATION

Understanding and Diagnosing Routing Dynamics inGlobal Internet

by

Mohit Vijay LadDoctor of Philosophy in Computer Science

University of California, Los Angeles, 2007

Professor Lixia Zhang, Chair

The global routing system is a critical component in the Internet infrastructure.

It delivers data to over 210,000 destination networks throughout the Internet. Large

scale events such as fiber cuts, power failures, or major changes in network connec-

tivity often lead to large scale routing changes, which in turn can cause widespread

disruption in data delivery service. Hence it is critically important to be able to de-

tect all significant routing changes and to identify the origins of these changes. Prior

works focused on examining the route changes to individual destinations in an attempt

to understand the behavior of the global routing system. Unfortunately the Internet’s

distributed nature and its sheer size makes such approaches infeasible and ineffective.

In this work, we develop a new concept in measuring routing distribution and dy-

namics. Instead of measuring changes to individual routes, we measure the total num-

ber of routes carried over each link, dubbed link weight. By examining the changes to

link weight, dubbed rank-change, we easily capture the aggregate route changes. We

use link-weights and rank-changes to visually capture large-scale routing events from

hundreds of megabytes of routing data collected from operational routers. In addition

to enable visual analysis of routing problems, the link weight metric also forms the ba-

xv

sis for automated inference to locate the origins of routing changes. We correlate link

weight changes across adjacent links and across observations from different vantage

points to construct an s-t graph, called a fault graph, which contains a virtual source

and sink node. The min-cut that disconnect the source and the sink by cutting the

least number of edges is the most likely solution where the problem originated. Our

evaluations show that this min-cut heuristic can identify the problem edges with a high

accuracy.

Another problem facing the Internet today is prefix hijacking where a destination

wrongly announces an IP address space it does not own causing routers to send traffic

to itself instead of the genuine destination. Prefix hijacking is a major threat to Internet

security and detecting a hijack early is important to reduce damage done. To this

end we designed a lightweight and easily deployable hijack alert system. We also

carried out a systematic evaluation of how impact of a hijack varies based on location

of attacker and target and found out that the tier-1 ISPs having the highest degree are

much more vulnerable than some of their multi-homed customers.

In summary, our definitions of link weight and rank-change contribute a new ab-

straction to measure Internet routing distribution and dynamics. This new abstraction

not only enables a comprehensive visualization of the global routing system and auto-

matic diagnosis, but also opens up new venues for more advanced routing dynamics

modeling and analysis. Our work on prefix hijack presents a quick and easy to de-

ploy first step of detection of attacks and also offers new insights into who is more

vulnerable against hijacks in the Internet.

xvi

CHAPTER 1

Introduction

Today’s Internet provides a global data delivery service to millions of end users. Net-

work routing protocols play a critical role in this delivery service by steering data traffic

towards their destinations. Routing problems can potentially lead to long latencies for

data delivery or in some cases even loss of data packets, resulting in a noticeable per-

formance degradation for end users. In the past, large-scale events such as fiber cuts,

power failures or major changes in connectivity have severely impacted the data deliv-

ery service. Besides performance degradation, data packets can also be wrongly routed

to malicious entities resulting in serious security and privacy breaches. This hijacking

of traffic often occurs due to problems at the routing level. Thus detecting significant

routing problems in a timely manner is critical to ensure the smooth operation of the

Internet, and is the focus of this dissertation. In this chapter, we introduce Internet

routing in detail and elaborate on the routing problems we are interested in detecting

and understanding. We explain why these problems are challenging and discuss at a

high level how these problems are attacked in the remainder of the dissertation.

1.1 Internet Routing and Border Gateway Protocol

The Internet consists of a large number of networks called autonomous systems (AS).

Each AS is assigned an AS number and contains one or multiple destination networks.

Each destination network is represented by an IP address prefix. For example, the

1

prefix 131.179.96.0/24 represents a network at UCLA and is part of AS 52 (UCLA’s

AS number). As of March 2007, the Internet consists of over 20,000 autonomous

systems and over 220,000 prefixes.

A routing protocol propagates the information about how to reach all the destina-

tions, throughout the network. A path vector protocol called Border Gateway Protocol

(BGP) [36] is the de-factor routing protocol used between autonomous systems in the

Internet today. Routing information in BGP is propagated by the exchange of BGP

update messages. A BGP update message contains information about the destination

prefix and the AS path used to reach that prefix. We represent a BGP update in the

form {〈prefix〉 : 〈ASpath〉}. Figure 1.1 shows how BGP updates propagate routing

information in the Internet. In this figure, AS 22 owns a prefix P1 and sends a BGP

update message {P1 : 22} to its neighbor AS 33. AS 22 is said to be the origin AS

for prefix P1. On receiving this update, AS 33 now prepends its own AS number to

the received path and sends the BGP update {P1 : 33, 22} to its neighbors, AS 44 and

AS 55. 1 AS 55 in turn sends the BGP update {P1 : 55, 33, 22} to its neighbor AS

44. Note, AS 44 receives two paths to reach P1. When an AS receives more than one

path to reach a prefix, it chooses one of them as the primary path. In Figure 1.1, we

assume AS 44 picks the path {P1 : 33, 22} because it is shorter. Generally speaking,

this decision on which path to pick is based on the routing policy of each individual

AS. An AS’s routing policy also determines whether to send a particular path to a

neighbor. Besides initial route propagation, physical events like link failures can also

trigger BGP updates. For example, assume the link (44, 33) goes down. As as result,

AS 44 switches to a backup path {55, 33, 22} that it had learnt earlier and sends the

BGP update {P1 : 55, 33, 22} to its neighbors.

1An AS may contain more than one BGP router as shown in Figure 1.1 (e.g. AS 33 contains 3 BGProuters) and routing information inside an AS is propagated using an intra-domain routing protocol. Inthis work, we focus on inter-domain routing dynamics and hence do not go into details of intra-domainrouting.

2

1.2 BGP monitoring

Internet routing is very dynamic and there are continuous routing changes going on

in the Internet. For example, a typical BGP router can see as much as 400,000 BGP

updates per day on average2. Since BGP updates propagate routing information in the

Internet, capturing the BGP updates at various parts of the Internet can give us useful

insight of the state of the Internet and amount of routing changes going on in the Inter-

net. In Figure 1.1, AS 44 is connected to a routing update collection box that receives

BGP updates from AS 44. This collection box represents the data collectors of BGP

monitoring projects such as RouteViews [40] and RIPE [38]. We call an AS connect-

ing to such a collection box as an observation point. These monitoring projects collect

BGP updates from various observation points (operational routers in autonomous sys-

tems) around the globe and make the data available to the public. Though individual

networks have access to their own BGP updates, this data can be used by network op-

erators for problem diagnosis by relating what they see to what others in the Internet

see. Further, this data is an invaluable source of information to researchers on the oper-

ation of BGP since it contains actual BGP update messages collected at different parts

of the Internet. Research in Internet routing has studied this data to identify unknown

problems as well as understand known problems better. In the rest of this work, we

repeatedly use this data source to not only understand and diagnose problems in the

Internet but also use the data to carry out realistic evaluation of designed schemes.

1.3 Problems and Scope of Thesis

We now present a high level overview of the problems we tackle in this work and

explain the organization of the thesis.

2Observed over a period of March 2007

3

AS 22

AS 33

AS 44

AS 55

Collection box

P1P1: 22

P1: 33 22

P1: 44 33 22

P1: 33

22

Monitored Router

P1: 5

5 33

22

Figure 1.1: Internet routing and BGP monitoring

1.3.1 Inferring origin of routing changes

Routing dynamics can degrade data delivery and in the event of large scale routing dy-

namics, knowing the identify of the problem AS can help the operator change routing

policies of his own AS to bypass the problem AS as much as possible. In addition, an

operator can also contact the AS in question and inform them of the observed prob-

lems. BGP is a path vector protocol and paths included in routing updates contain

complete AS hop information. Thus by looking at the previous and new path, one may

hope to infer where the problem originated. In Chapter 2, we discuss the problem of

inferring location of change given set of paths before and after an event. We consider a

simplistic network model and see that exact inference is not always possible and often

more than one explanations can be found for an observed change. We further show

that even inferring the minimum size explanation set is NP hard in the general case.

We also prove that it can be optimally solved under certain situations. Our results in

Chapter 2 show that problem inference even in an ideal network setting is very diffi-

cult. Our ultimate goal is to be able to infer origin of routing changes observed in a

much more complex setting of the Internet.

4

1.3.2 Understanding routing changes in the Internet

With over 210,000 destinations, Internet routing is very dynamic and BGP routers

receive lots of BGP updates. At this scale, routing dynamics are expected, but sepa-

rating routine routing changes from more serious ones is a challenging task. Further,

in the aftermath of a known big event like fiber cuts or major link or AS problems,

network operators would like to know how badly their own routing was affected. As

an example, an undersea fiber cut resulting from an earthquake off the coast of Tai-

wan caused widespread routing dynamics. Network operators would be interested in

understanding what was the impact of this event on their own as well as global Inter-

net routing. Researchers would also benefit from understanding how BGP performs

under normal operations as well as under stress events like the post-earthquake pe-

riod. However, studies on BGP dynamics have focussed on studying end to end routes

to individual destinations and using this technique simply does not scale when trying

to understand large scale routing events and their impact. Multiple routes may share

common links and what is needed is some intrinsic metric that will reflect aggregate

behavior of groups of routes that change in similar fashion. Chapter 3 introduces our

notion of capturing aggregate behavior by assigning a weight to each link based on the

number of routes carried over it. This notion of link-weight is a very important concept

that allows one to analyze Internet routing from a completely new perspective of link-

weight changes. In Chapter 4 we show how this metric can be used to construct visual

representation of large scale routing changes, enabling one to understand the impact

of routing changes seen as well as identify the origins of observed changes. Our main

challenge in visualizing routing dynamics involves scaling the visualization to the size

of the topology with over 20,000 autonomous systems and more than 80,000 AS links.

Using link weights and focussing on the heavy link weight changes provides a natu-

ral way to scale in this regard. The resulting visual representations can greatly help

5

network operators who struggle to mine meaningful data from gigabytes of routing

updates. With these visual graphs showing the major link weight changes, network

operators can understand the aggregate changes better. Using case studies we show

how our visualization can provide interesting insight into known routing events.

While the aim of Chapter 4 is to visually summarize large scale routing changes

to aid visual analysis, the aim of Chapter 5 is to automatically infer the origin of large

scale routing changes. We first observe that the link weights of most links are relatively

stable under normal operation. Hence by continuously monitoring link weights, we can

identify a routing event when the link weight of any link deviates significantly from its

typical value. Once a routing event is identified, we record the changes in link weights

and develop a heuristic to relate the link weight changes to each other in the form of an

s-t graph containing a source node s and a sink node t. The graph is so constructed that

any cut disconnecting s from t represents a set of links whose failure can explain the

observed changes. By adding views from other monitoring points as well, we show

that the minimum s-t cut is the most likely explanation of edges responsible for the

routing changes. Using known case studies, and evaluation we show the effectiveness

of the inference heuristic in identifying the origin of the routing changes. Overall, we

present a scheme that provides high accuracy from a small set of observation points

and is much faster and hence more practical than previous schemes based on prefix

level clustering.

1.3.3 Security Problems

Besides diagnosing routing problems that can impact data delivery performance, an-

other aspect of this work is to identify and diagnose routing problems that can lead to

security and privacy breaches. Prefix hijacking is one such routing problem where an

AS announces a destination IP prefix it does not own and can deceive other routers in

6

the Internet to route packets to itself. For example, lets say a malicious AS announces

an IP prefix belonging to a bank, and assume the BGP routers in AS X find the route to

the malicious AS more lucrative than the genuine AS and hence wrongly route packets

to the malicious AS. One effect of this could be that users in AS X will not be able

to access sites belonging to the bank, but a more severe effect could result if the ma-

licious AS hosts a web server on the exact same IP as the bank. In this case, users

in AS X would actually be sending private account information to the malicious AS,

thus compromising security of the accounts. Clearly prefix hijacking is a serious threat

to the Internet and needs to be detected as soon as possible. However a BGP router

cannot easily detect prefix hijacks, since BGP allows multiple autonomous systems

to announce the same prefix and this is often done for legitimate reasons. Chapter 6

discusses the challenges involved in designing such a detection system and presents

a scalable and easily deployable system design to carry out hijack detection in near

real-time. Our design is based on the idea that on seeing a new origin for a prefix, the

prefix owner is the best person to decide whether its prefix is being hijacked. Hence,

we design an alert system that quickly alerts prefix owners of potential hijacks. Our

system pushes the complexity to the end users, and hence does not suffer from the

problem of outdated data submitted by prefix owners.

Besides detecting and reacting to hijacks, it is important to understand the re-

siliency of the Internet topology against prefix hijacking. In particular, it is impor-

tant to understand which parts of the Internet are more robust against hijacks directed

towards them, and what factors drive the amount of damage a hijack can cause? Know-

ing this can enable customers to better decide which Internet Service Providers (ISPs)

to connect with for better security against hijacks. An understanding of the factors

influencing hijack can also enable us to understand the vulnerability better and attempt

to fix it. Chapter 7 presents a detailed study on these lines and finds that the ISPs with

the highest degrees (tier-1 ISPs) are often worse impacted than some smaller ISPs in

7

the event of a hijack. The implication of this study is multi-fold. First, in terms of hi-

jack resilience bigger is not necessarily bettter. Second, in order to secure the Internet,

it is not enough to secure the big ISPs since high impact attacks can be launched from

the edge as well.

8

CHAPTER 2

Inferring Failures in Path Vector Routing

Path vector routing protocols convey complete path information to reach a destination

node. These protocols can adapt dynamically to topological changes like link failures

but usually do not convey any explicit notification of which links failed. Without

any explicit failure notification, one can only hope to infer failed links based on the

complete path information before and after failures. For example, assume a node 1

can reach a destination node 4 using the the path 1 → 2 → 4. Now due to some link

failure, this path changes to 1 → 2 → 3 → 4. One can see that link (2, 4) must have

failed to cause A to switch to path 1→ 2→ 3→ 4. However, if the path to C changes

from 1 → 2 → 4 to 1 → 3 → 4, one cannot tell whether the link that failed was

(1, 2) or (2, 4). In cases where more than one failure scenario can explain the observed

routing changes, simply analyzing the path to a single destination is not enough.

Even after looking at how paths to all destinations are affected, one might face

multiple failures scenarios. For example, assume that the path to destination 5 also

changes from 1 → 2 → 5 to 1 → 3 → 5 at the same time as the path to destination

4 changes from 1 → 2 → 4 to 1 → 3 → 4 as shown in Figure 2.1. Given this

information of path changes to destinations 4 and 5, three candidate failure scenarios

are a) failure of link (1, 2), b) failures of links (2, 4) and (2, 5) and c) failures of links

(1, 2), (2, 4) and (2, 5). We call each of these scenarios as an explanation set or e-

set. Even in this simple case with one source and two destinations, we have three

e-sets and cannot say for sure which one is the cause. In situations where multiple

9

1

2

4 5

3

1

4 5

3

a. Initial tree T0 from node 1 b. Final Tree T1 from node 1

Figure 2.1: Possible e-sets are {(1, 2)} and {(2, 4), (2, 5), (2, 6)}

failure scenarios are possible, the minimum failure e-set problem involves identifying

the minimum number of links whose failure can explain the observed route changes.

Identifying minimum number of failed edges can give us a lower bound on the number

of failures causing route changes. For example, in Figure 2.1, we can say for sure

that at least one edge must have failed. In addition, if all links have the same failure

probability, then one can see that the failure of link (1, 2), i.e. the minimum e-set, is

also the most likely solution.

In this work, we first look at how to infer single failures from remote observations.

We then formalize the minimum e-set problem and show the problem is NP-complete

in the general case. We discuss conditions under which the minimum e-set can be

optimally found and present simple algorithms for these cases.

2.1 Model and Problem Definition

We now provide details about the network and routing model used in this chapter. We

also formally define the problem of minimum e-set in this section.

10

2.1.1 Network and Routing Model

We model the network as a simple directed connected graph G = (V,E), where V =

VD ∪ VN and E = ED ∪ EN . VD represents the set of destinations and the nodes

in VN are transit nodes. We are only interested in routes to VD. Nodes in VN are

connected by links in EN and each edge in EN has the form (a, b) where a, b ∈ VN .

The destinations are attached to the nodes in VN through edges in ED and each edge

in ED has the form [d, n] where d ∈ VD and n ∈ VN . In all figures, nodes in VD

(destinations) are represented by the presence of small solid rectangles while nodes

in VN (transit nodes) do not have these solid rectangles. For example, in Figure 2.1,

nodes 4, 5 and 3 belong to VD, while nodes 1, 2 belong to VN .

We use a Simple Path Vector Protocol (SPVP) [14]. In SPVP each node advertises

only its best path to reach the destinations. A path from node v to destination d is a

sequence of nodes pathv(d) = (vkvk−1 . . . v0d) where vk = v, (vi, vi−1) ∈ EN for

all 0 ≤ i ≤ k, and (v0, d) ∈ ED . We define pathLength(pathv(d)) = k + 1. After

receiving and storing a route learned from its neighbors, node v selects its best path to

destination d according to some routing policy.

The routing policy could be any policy that maintains the tree property. In other

words, the routing table containing routes to all destinations at any stage can be graph-

ically represented in the form of a tree. Thus, at any node u, there has to be only one

path to any other node v, even if v is a used as a transit node to reach some other

destination. In the remainder of the chapter, when we refer to ‘best’ path, we mean a

path that is deemed best given the routing policy. We abstract out the notion of ‘best’

path and separate it from its physical interpretation. For illustration purposes, we use

the shortest path routing policy where a node picks the path with the lowest number of

hops as its best path. In case of a tie for more than one paths with the lowest number of

hops, the node picks the path from the neighbor with the lowest numeric ID. While the

11

shortest path policy is used for illustration purposes, our algorithms and proofs work

with any policy that can maintain the tree property.

2.1.2 Failure Model

We allow any number of links to fail in the network. All links in the network have the

same failure probability. If a link fails, the nodes adjacent to the link detect the failure

and all nodes using the link must switch to alternate path (or declare the destination

unreachable). The link failures are not directly reported to any central database or

monitoring site. However, a node whose path is affected by this failure, will see a new

path implying that something went wrong on the initial tree. Different link failures may

impact different observation points and hence we are concerned with the minimum

failures as seen from a particular observation point only. Link failures are atomic,

which means if a link fails, no node can use it. We look at route changes observed

from a node to infer failed links. The route changes seen must have happened only due

to link failures and cannot happen due to the addition of new links.

2.1.3 Input Requirements

We do not make any assumption about the the topology of the entire graph being known

to any single node in the graph. We define an observation point as a node for which we

can see the complete routing table at any time. The input we have constitutes of two

routing table snapshots. We assume that the initial as well as final routing tables reflect

the steady state tables after the routes for all the nodes in the network have converged.

In other words, there are no more routing updates propagating in the network at the

time of the two routing table snapshots.

12

2.2 Minimum e-set for tree inputs

In this section, we present a heuristic to solve the tree version of the minimum e-set

problem optimally. The results in this section apply to any routing policy that always

produces a tree.

This heuristic uses the notion of nearest-descendant defined below. For any node

u in T0 ∩ T1, where M is the root of the trees, call node v a nearest-descendant of u if

1. v ∈ T0 and u is closer to M than v in T0

2. v is also in T1 or v ∈ VD, where VD is set of destinations and

3. for any x in T0 on the path from u to v, x /∈ T1

To find the nearest-descendant pairs, we use Algorithm 1. The nearest-descendant

algorithm uses depth first search (DFS) over T0 to find the nearest descendants of a

node u ∈ T0. Along any DFS path from u, if a node v is present in T1, then no descen-

dant of v needs to be explored, and v is the nearest descendant of u. Figure 2.2 shows

trees T0 and T1 observed from node 1. Node 1 has three children 2, 3 and 6. Node 2

is not present in T1 nor is a destination, hence 2 cannot be a descendent. Recursing

from node 2, we can find 4 and 5 to be in T1, and thus are nearest descendants of 2 and

hence of 1. Similarly, 3 is also a nearest descendant of 1. Finally, notice that 8 is not

in T1. This is possible if there is no valid path to reach 8 in T1. But, since 8 is a des-

tination, it is also a nearest-descendant of node 1. Summarizing, node 1 has 4 nearest

descendants in 3, 4, 5 and 8. Among, the other nodes, 2 has nearest descendants 4 and

5, while nodes 3, 4 and 5 do not have any nearest descendants.

Once, we find the set of all (u, v) pairs, such that v is a nearest-descendant of u,

we use Algorithm 2 to compute the minimum e-set. For each (u, v) pair, we check if

T1 would still be the computed tree if edges on the path (u, v) are added to T1. For

13

Algorithm 1: nearest descendant(Node u, Path ρ)Input: Node u in T0, Path ρOutput: P : Set of paths to nearest-descendants of uwhile there is some v, such that (u, v) ∈ T0 AND v is not visited do

Mark v as visited ;

Add (u, v) to path ρ;

if v ∈ T1 or v ∈ VD then

Add path ρ to P ;

else

nearest descendant(v,ρ);

Remove (u, v) from path ρ;

1

2

4 5

3

a. Initial tree T0 from node 1 b. Final Tree T1 from node 1

6

8

1

4 5

3

7

Figure 2.2: Example for nearest descendent and minimum e-set from Algorithm 2

14

example, consider the pair (1, 8) with path 1→ 6→ 8, we have

RouteCompute(1, T1 ∪ path(1, 8)) 6= T1

Thus, we fail the first link on the path 1 → 6 → 8. Similarly, edge (1, 2) is marked

failed for path between (1, 4) as well as (1, 5). On the other hand consider nearest

descendent pair (2, 4). For this pair we have,

RouteCompute(1, T1 ∪ path(2, 4)) = T1

and hence we do not fail any edge on this nearest descendant pair. After considering

all the (u, v) pairs, the minimum e-set in this case is {(1, 2), (1, 6)}. We now prove

correctness and optimality for Algorithm 2.

Algorithm 2: FindFault()Input: T0 and T1, the routing trees from vantage point MOutput: F : set of edges marked failed;for each pair nearest-descendent pair (u, v) with path P do

if (RouteCompute(M,T1 ∪ P )) 6= T1 then

/* One of the edges on the (u, v) path P must have failed. */

Mark the first edge on the (u, v) path in T0 as failed.

Lemma 0.1. The set of edges F thus determined comprises an e-set.

Proof. We prove this theorem using shortest path routing, but the proof can easily

apply to other tree based routing policies. Assume the contrapositive, in other words,

when these edges fail the resulting tree Tf on (T0 ∪ T1)− F is not the final tree T1. It

follows that there’s some (a, b) in T1 such that a shorter path exists in Tf . Since we do

not fail edges of T1, the path (a, b) can only be shorter. Let (a, b) be such a pair whose

real shortest path in T0 + T1 − F is as short as possible. Suppose the shortest path

between (a, b) includes some node x in T1. Then we must have either (a, x) or (x, b)

15

closer in T0 + T1 − F than they are in T1, thus contradicting the definition of (a, b).

It follows that the path between (a, b) does not include any other vertices of T1, and

therefore travels entirely though T0. In particular, (a, b) are nodes of T0 (and T1) and

the path between them does not contain any nodes of T1, so one is a nearest-descendant

of the other. It follows that we would have included an edge on the (a, b) path in F .

Theorem 1. The e-set F thus determined is minimum.

Proof. Consider all pairs (u, v) where v is a nearest-descendant of u. We know that

we must delete some edge on the path (u, v) in T0, otherwise T1 could not be a shortest

path tree. Note that if v is a nearest-descendant of u, then v cannot also be a nearest-

descendant of u′ for any u′ not equal to u. We can therefore imagine chopping the

trees into edge-disjoint subtrees, where each subtree consists of a node in both T0 and

T1 along with its nearest-descendants and the paths to them. For each subtree, we

consider breaking it up into further edge-disjoint subtrees via the edges from the root

(u). For each edge outgoing from u, if we cut that edge then any algorithm must cut

one of the edges in the relevant subtree. Since all subtrees are disjoint, it follows that

any algorithm must cut at least as many edges as ours cuts.

2.3 Minimum e-set for general graphs

So far, we discussed how to optimally solve the minimum e-set problem with the

nearest-descendent approach. In Section 2.2, we restricted the input to a node be

the routing trees from that node before and after the failures. However, nodes might

have additional information about the underlying topology besides its routing tree.

One source of information is the routes received from other nodes. For example, lets

16

assume a node A receives a route B → C → D to reach the destination D from

neighbor B. Node A also receives the route F → D to reach destination D from

neighbor F. Assume node A selects route F → D via neighbor F (assuming shortest

path), and does not use B to reach any other destination. Even though links B → C

and C → D may not be in A’s routing table, it knows these two links are up, else B

would have sent a new route to reach D. To account for this extra information that a

node might have, we generalize the problem to find minimum e-set when the inputs

before and after failures can be general graphs instead of a trees.

2.3.1 Problem Definition: Graph Version

At an observation point M , let T0 and T1 indicate the routing trees used by node M at

two different times t0 and t1 respectively. In addition, let G0 and G1 indicate the set

of links known to be up at t0 and t1 respectively. The set G0 could be just T0, but can

contain additional arbitrary edge information, and T0 ⊆ G0. Similarly, T1 ⊆ G1. We

have,

RouteCompute(M,G0 ∪G1) = T0 (2.1)

A set of links is called explanation set (e-set) iff:

RouteCompute(M, (G0 ∪G1)− F ) = T1 (2.2)

Minimum e-set problem: Given two graphs G0 and G1 from a node M containing

routing table links as well as other links, and a known routing policy, find the minimal

set of edges F , such that RouteCompute(M, (G0∪G1)−F ) = T1, where T1 represents

the routing table RouteCompute(M,G1).

While, we show the general case of this problem is NP-complete, a special case of

this problem with graph inputs can be solved optimally if each node is a destination.

17

2.3.2 The case of no purely transit nodes, VN = φ or V = VD

We present a greedy Algorithm 3 called greedy Fault() to find optimal solution for the

case where each node is a destination, or equivalently there are no transit nodes. This

algorithm removes one edge at a time with edges ordered in breadth first order.

Algorithm 3: greedy Fault(M,G0, G1)Input: M : the vantage point ;

G0: up edges at time t0;

G1: up edges at time t1;Output: e-set of candidate failures F1. Initialize Gf = G0 ∪G1;

2. Compute Tf = RouteCompute(M,Gf );

3. for edges (X, Y ) of Tf in BFS order do

if (X, Y ) /∈ G1 then

F = F ∪ {(X, Y )};

remove (X, Y ) from Gf ;

goto step 2;

4. return F ;

Figure 2.3 provides an example showing the output of the greedy algorithm on

a simple graph. For this example, we assume the shortest path routing policy with

lowest ID tie-breaking. In this example, each node is a destination. Figure 2.3a and

Figure 2.3b represent the initial and final graphs. Note, the link (2, 3) is not in T0 but

is present as additional information in the initial graph G0. In contrast the final graph

G1 does not contain any new information and hence is the same as T1. At each stage,

we superimpose the tree edges shown by directed edges over edges in the graph. After

removal of link (1, 2) from G, the link (3, 2) is now on the tree from node 1 as shown

in Figure 2.3d. It can be seen that link (3, 2) is not in the final tree and without its

18

failure, the path cannot change to use link (9, 2). Hence, (3, 2) is failed next to result

in the final tree T1.

We define ‘sufficient’ failure set as the links whose failure is enough to change the

tree from T0 to T1, and ‘necessary’ failure set as the links which must fail in order

to change the tree from T0 to T1. With each node as a destination our algorithm will

always output sufficient as well as necessary failures.

Theorem 2. Let F be the set of edges returned by greedy Fault(M,G0, G1). Then

F is both necessary and sufficient set of edges to explain the change from T0 to T1.

Proof. To show that F is sufficient, we will show that Tf = T1 when the algorithm

terminates. In the for loop, we remove edges only from G0 − G1. Thus at the end,

Tf =RouteCompute(M,G′) where G1 ⊆ G′. Also, the for loop removes edges from

Tf that are in G0−G1. Thus at the end, Tf does not include any edges from G0−G1.

Thus Tf = RouteCompute(M,G′) =RouteCompute(M,G1) = T1.

Now to show that F is necessary, let F ′ by any set of edges such that RouteCompute

((G0∪G1)−F ′) = T1. We will show that F ⊆ F ′, by induction on the iteration of the

for loop. Suppose by induction, at the end of the i-th iteration, e1, . . . , ei are the edges

selected by the algorithm, and by induction, F ′ includes these edges. We will show

that F ′ must include ei+1 = (u, v) selected by the algorithm in the (i+ 1)-st iteration.

Note that the algorithm chooses the first edge in the BFS order of Tf from M that is

not in G1. Consider the path P from M to u in Tf . All edges of P are available at time

t1 (they are in G1, so the algorithm did not select them). Hence if (u, v) did not fail

during [t0, t1], and since v ∈ VD, T1 would include (u, v). It follows that any e-set F ′

must also include (u, v) in order to be a valid explanation-set.

19

1

2

4 5 6

3

1

2

4 5 6

3

a. G0 = T0 + edge (2,3) b. G1 = Final Tree T1

c. G=G0+G1

d. Remove link (1,2)

Link in tree from node 1

Link removed by greedy

9 9

1

2

4 5 6

39

1

2

4 5 6

39F

e. Remove link (2,3) to give T1

1

2

4 5 6

39F

F

edge in the graph

Figure 2.3: greedy Fault() with each node as a destination

20

a. G0=Initial Tree T0 + {(2,3),(3,5)}

1

2

4

8

3

5

1

3

5

8

b. G1=Final Tree T1

1

2

4

8

3

5

F

F

e. Mark (1,2) and (3,4) as failed

Figure 2.4: Example showing greedy not optimal in general case

2.3.3 The case of purely transit nodes, VN 6= φ

The Algorithm 3 from the previous section does not produce an optimal solution if

the network contains purely transit nodes. Consider Figure 2.4 as an example where

we see routes from node 1 to a single destination node 8, and nodes 2, 3, 4, 5 are just

transit nodes. Parts 2.4a and 2.4b show the initial and final graphs. The edges with

arrows represent the routing table, while the others represent information about edges

through other means. We assume the network uses shortest path routing with lowest

ID tie breaking for this example. The greedy Algorithm 3 would first mark edge (1, 2)

failed, which would shift the route to 1 → 3 → 4 → 8, since 4 has a lower ID than

5 and both the path via 4 and 5 are of length 3. The greedy algorithm would then

mark edge (3, 4) failed resulting in the final tree. But clearly in this case, the e-set of

{(4, 8)} can explain the above change and is also minimal. We now prove that this

general version is NP-complete.

To show that finding minimum e-set in the general case is NP-complete, we will

consider a restriction of it, Directed Path Change (DPC), and prove it is NP-complete.

In DPC, we restrict the number of destinations to one, so that each routing graph is a

single path. But we allow arbitrary G0 and G1. Since DPC involves change of a single

21

path instead of multiple paths in our general problem, it is thus a subset of the general

case and proving hardness for DPC will imply hardness for the general problem of

multiple destinations. We use shortest path policy for this proof.

Directed Path Change (DPC): Given directed graphs G0, G1, two special nodes

s, t, and an integer k ≥ 0, let Pi denote the minimum hop s → t path in Gi such

that P0 is also the minimum hop s → t path in G0 + G1. Does there exist a set of

edges F ⊆ G0 − G1 such that |F | ≤ k and P1 is also a minimum hop s → t path in

G0 +G1 − F ?

We reduce following problem, UMC, proven to be NP-complete by [8].

Undirected Multiway Cut (UMC): Given undirected graph G, three special nodes

x, y, z and an integer k ≥ 0, is there a set of edges F , |F | ≤ k such that in G − F ,

x, y, z are disconnected from each other?

Reduction: Given an instance of UMC(G, x, y, z, k), we create a new directed graph

G′ as follows. The vertices of G′ are the vertices of G plus many additional vertices.

The edges of G′ are defined as follows: for every (undirected) edge (u, v) of G, we

put following gadget:1 put two new vertices w1, w2 (new for each different edge) and

add edges (u → w1), (w2 → u), (v → w1), (w2 → v) and (w1 → w2). Now any

path in G that goes u → v, can be simulated in G′ as u → w1 → w2 → v and any

path in G that goes v → u can be simulated in G′ as v → w1 → w2 → u. We also

add two special vertices s, t in G′ and add edges (s → x), (z → t). Let n denote the

number of vertices in G. Take two long paths Q1, Q2 of length D � 4n consisting of

new vertices, where Q1 is x → y and Q2 is y → z. Let P0 be the minimum hop path

1This gadget is also used by [12].

22

s → t in G′ and let P1 be an s → t path by following s → xQ1→ y

Q2→ z → t. This

(G0 = G′, G1 = P1, s, t, P0, P1, k) is our instance of DPC.

We shall show that there exists a UMC solution F iff it is also a DPC solution.

Let F be a solution to UMC instance (G, x, y, z, k). Let F ′ be the set of edges in G′

where for each edge (u, v) in F , we put the corresponding edge (w1, w2) in F ′. If we

remove F ′ and P1 from G0 = G′, then x, y, z are disconnected. So the minimum hop

s → t path in G0 − F ′ is P1, and hence F ′ is a solution to DPC (G0 = G′, G1 =

P1, s, t, P0, P1, k).

Conversely, let F ′ be a solution to (G0 = G′, G1 = P1, s, t, P0, P1, k). For all the

edges picked up from a single (u, v) gadget, we can replace them by (w1, w2) and get

a new solution F ′′, |F ′′| ≤ |F ′|. We claim that G′ − F ′′ − P1 has x, y, z disconnected.

If there was a path between any pair, say Q : z → y, then by symmetry2 there is also

a path Q̄ : y → z. Now any y → z path in G′ − F ′′ − P1 has a corresponding path in

G of length at most n − 1, and by our gadget the pathlength multiplies by 3 in G′, so

|Q| = |Q̄| ≤ 3(n−1). Then the path s→ xQ1→ y

Q̄→ z → t is available inG′−F ′′, and

is of length at most 1 +D+ 3(n− 1) + 1 which is less than the length of P1 = 2 + 2D

(recall D � 4n). This contradicts the claim that minimum hop path in G′ − F ′′ is P1.

Therefore F ′′ disconnects x, y, z and by choosing corresponding (u, v) edges, we get

F , |F | ≤ k, such that deleting F disconnects x, y, z in G.

Thus we have shown,

Lemma 2.1. DPC is NP-hard.

Since DPC is a special case of finding minimum e-set, and given any candidate

solution e-set, it is can clearly be verified in polynomial time, we get,

Theorem 3. Finding minimum e-set is NP-complete.

2If the original path traverses v → w1 → w2 → u, then we reverse it to u → w1 → w2 → v sinceour gadget has all these edges.

23

CHAPTER 3

Identifying Routing Problems in the Internet

In Chapter 2 we examined the problem of inferring location of failures in a simple path

vector routing framework. In this chapter, we examine the domain of Internet routing

in detail and understand the main challenges that make inference of problem areas

difficult in reality. We also explain how we take a fundamentally different approach

than prior works to analyze Internet routing problems.

3.1 Challenges in diagnosing Internet routing problems

There are various factors in Internet routing diagnosis that make it more challenging

than the ideal scenarios considered in Chapter 2. Primary among them is the lack of

knowledge of the complete topology. Another challenge in diagnosing Internet routing

problems is that a BGP router is unaware of how BGP routers in other autonomous

systems pick their best paths. In 2, our model assumed nodes choose best paths based

on shortest path routing with lowest ID tie breaker. However, in Internet commercial

policies and agreements drive routing decisions, even though shortest paths are often

picked after policy decisions are applied. The routing table from any given router may

not necessarily be a tree, and a node may be reachable via different routes for different

prefixes. Since the tree property is often violated and route preferences of other BGP

routers are unknown, it is difficult to apply a greedy scheme like Algorithm 3 on

page 18. Another factor complicating the inference is the presence of multiple BGP

24

peering sessions among large autonomous systems. In other words, even though a

BGP path may show a single link A-B between two autonomous systems A and B,

in reality they may connect at multiple physical locations. When one of the physical

connection breaks down, some prefixes may be affected and change routes, but others

may still continue to use the link A-B.

Besides the above factor the constant churn of updates in the Internet also presents

a challenge. Internet routing is very dynamic and at any given point of time, there are

continuous routing changes going on. It is thus difficult to group routing changes into

events and mark the start and end of the event. What we need to diagnose problems at

the global Internet scale is a metric that will capture aggregate routing behavior.

3.2 Capturing aggregate behavior: Notion of Link-weight

Internet routing is a big system with over 210,000 prefixes. Prior works in diagnosing

BGP problems have examined individual routes and attempted to correlate changes

across different prefixes. We take a fundamentally different approach to understand

Internet routing. We are interested in capturing some notion of aggregate routing be-

havior. We observe that different routes go through common AS links and the com-

monality in behavior can be captured by examining links instead of individual prefixes.

A link is weighed by the number of routes using that link from any given observation

point.

To explain this concept, we use a simple example shown in Figure 3.1. This figure

depicts the routing table seen by a router in AS 44 in Figure 1.1a, in the form of a

graph. Here we assume the existence of two more prefixes P2 and P3 announced by

AS 33 and AS 55 respectively. In Figure 3.1a, link (44, 33) has a weight of 2, since

that link appears twice in the routing table at AS 44. We denote the link weight by

25

44

55

33

22

P2

P1P3

P2: 44-33 P3: 44-55

2

1

1

P2: 44-33 P3: 44-33-55

P1: 44-33-22 P1: 44-33-22

P3: W

a. Link Weight seen by 44

b. Link Weight seen by 44 after 55 withdraws route to

P3.

44

55

33

22

P2

P1P3

3

11

Figure 3.1: The notion of link weight

wt(〈link〉, 〈observationpoint〉, e.g. wt((44, 33), 44) = 2. If BGP updates received at

AS 44 change the routing table at AS 44, the Link-Rank graph will also change. In

Figure 3.1b, AS 55 withdraws its route to P3, and as a result of this withdraw message,

AS 44 shifts to an alternate path to reach P3. The weight of link (44, 33) has now

increased from 2 to 3. Instead of viewing the routing change as a change in path, we

understand the overall change seen by AS 44 as change in its link weights over time.

To understand BGP dynamics using link weight metric, we need to understand

how links change weights as a result of the BGP updates. As a first step, we looked

at BGP updates over a period of one week and marked the links changing rank after

each BGP update. We found that the weight changes usually occurred in bursts. As a

result, instead of looking at the weights after each BGP update, we could analyze just

two snapshots. As a first step towards understanding routing dynamics, we visualized

links whose weights change significantly. Since we assign a measure of importance

or rank to a link using the number of routes carried, we call a visual representation of

the link weights as a Link-Rank graph and a representation of the changes in weights

as a Rank-Change graph. Rank-change graphs capture these links whose weights have

changed.

26

44

55

33-1

+1

+1

44

55

330(-1)

3(+1)

1(+1)

a. rank changes only b. link rank and rank change

Figure 3.2: Rank-change graph for change in Figure 3.1

A Rank-change graph takes the difference between two Link-Rank graphs and uses

red (or dashed) edges to mark the links that have lost routes and green (or solid) edges

to mark links that have gained routes. Simply stated, given two Link-Rank graphs from

G1 and G2 at different times t1 and t2 respectively, a Rank-change graph plots all links

(a, b) where the weight on these links wt((a, b), G1)−wt((a, b), G2) 6= 0. Figure 3.2a

shows the Rank-change graph for the routing change in Figure 3.1. From this figure,

one can clearly see that link (44, 55) lost 1 route, while the link (44, 33) and (33, 55)

gained a route. Note, the Rank-change graph does not show links that have not gained

or lost routes, e.g. link (33, 22). The edge label in a Rank-change graph can show

just the weight changes or the current link weight followed by the weight change in

parenthesis as shown in Figure 3.2.

We employ the metric of link weights for two purposes, visually understanding

large scale changes and inferring the origin of large scale changes. Chapter 4 explains

how link-weight changes in the form of Rank-Change graphs can be used to visualize

large-scale routing changes. The techniques in Chapter 4 can help network opera-

tors identify major changes among tons of BGP updates, as well as understand how

they were impacted by major routing events. Besides visually inspecting large scale

changes, we are also interested in automatically inferring the origin of the change. In

Chapter 5, we use the concept of link weight and present a heuristic that can accurately

27

infer the origin of the change.

28

CHAPTER 4

Visualizing Internet Routing Dynamics using

Link-Rank

When trying to understand Internet routing behavior, one is faced with hundreds of

megabytes of BGP update data and identifying interesting activity involves manually

sifting through this large amount of data. To make this task easier, visualization has

been proposed to ease the task of analysis. BGPlay [5] is a tool that visualizes route

changes from different monitors in the Internet to a single prefix. In this chapter, we

explore the use of visualization for understanding aggregate routing dynamics. We are

interested in understanding how visualization can help understand what exactly hap-

pened during and after large scale routing events like fiber cuts or major link failures.

We take a different approach from other work like [5], and instead of examining pre-

fix level routing changes, we use the link weight notion explained in Section 3.2 of

Chapter 3. In this chapter, we describe the Link-Rank visualization tool built on the

notion of visualizing link weight changes. Using case studies, we show how the above

features provided by Link-Rank can help network operators mine and understand in-

teresting routing changes from gigabytes of routing data.

29

4.1 Components of Link-Rank

The three components of the Link-Rank tool are shown in Figure 4.1. An important

component is the input filter block that controls when the Rank-change graphs are

constructed. In Figure 3.2, we saw the Rank-change graph for a single route change.

In reality, input filters are needed to enable Link-Rank to scale in regard to topology

size and number of BGP updates. One input filter involves picking a specific set of

prefixes and examining the routing changes for these prefixes. Another input filter is a

threshold based scheme and is the filter used in all our case studies explained later in

this chapter. In this threshold based scheme, we maintain the instantaneous link weight

for each link in the topology seen by an observation point. In addition, we maintain

the change in weight since the last Rank-change graph was generated. The link weight

as well as the change in weight is updated for all links affected by each BGP update

message. A Rank-change graph is generated when the weight of any link changes by

more than a preset threshold (default is 50). A detailed treatment of this scheme and

numerical results of the effect of threshold is beyond the scope of this chapter and the

interested reader may find more details in [25].

Using the threshold filter with BGP updates, a single routing event may be broken

into multiple Rank-change graphs. For example, assume a link (A,B) fails and 5000

routes using that link are affected. This will result in a burst of 5000 BGP updates

closely spaced in time, each of which reduces the rank of the link (A,B) by 1. Thus

the entire update burst would reduce the rank of (A,B) by 5000. If the threshold filter

generates a Rank-change graph each time the link weight changes by 50, there would

be as many as 100 Rank-change graphs, each with a change of 50 routes on link (A,B).

We employed a timing mechanism to reduce the number of Rank-change graphs due

to the same event. We observed that by delaying the construction of the Rank-change

graph by a short time, we could drastically reduce the number of Rank-change graphs

30

InputFilter

GraphGenerator

Output Filter

updatesraw

Rank-changegraph

Final Rank-Change

Graph

filteredupdates

Figure 4.1: Components of Link-Rank

for the same routing event. We call this time to delay construction of Rank-change

graph, as event timer and set its value to 30 seconds. During the event timer, if routing

changes add weight x to a link and immediately change back to reduce the weight on

that link by x, the net weight change would be 0 (termed as compensating change) and

hence no Rank-change graph will be generated (since weight change is below threshold

of 50). Our choice of 30 seconds for the event timer was motivated by the BGP timer

called MinRouteAdver timer explained in Section 1.1. With the MinRouteAdver timer

set to the recommended time of 30 seconds, compensating changes cannot happen at

a frequency less than 30 seconds. Though not all routers in the Internet are known to

use the MRAI timer, we found the event timer value of 30 seconds to be adequate.

The graph generator component outputs the Rank-change graph based on the up-

dates fed to it by the input filter. The output filter can control the links and nodes in the

Rank-change graph for brevity. Filter rules for the output could be simple weight based

rules such as ‘remove all links below a change of 10’ or more complex such as ‘show

graphs with at least one of the nodes 338, 55 AND links 44→33’. The output filter is

part of the visualization tool, and based on graph complexity, one can dynamically use

filter rules to simplify the graphs. Summarizing, the input filter prepares the data for

Rank-change graphs and the output filter can be used to prune the Rank-change graph

further.

31

4.2 Features of Link-Rank

We now present the visualization details of Link-Rank and discuss various features

used in Link-Rank to deal with large amounts of data.

4.2.1 Nodes, edges and color coding

We now discuss some details of visualization in Rank-change graphs. Figure 4.2 shows

an actual Rank-change graph from BGP data. Note, the Internet has over 20,000 au-

tonomous systems, and currently only a few hundred observation points are connected

to public data collectors. Observation points from where one can observe routing

changes, are shown as circular nodes to differentiate them from rectangular nodes that

are not observation points. Visually separating the observation points from the other

nodes, clearly highlights other possible view points that can be used to better under-

stand the same time interval. The observation point of the Rank-change graph (AS

6453) is colored blue to differentiate from other observation points that are colored

orange.

Edges in Link-Rank are primarily red or green in color. An edge is colored red

when it loses routes and green when it gains routes. To help users with difficulty to

distinguish between certain colors, Rank-change graphs can also be displayed using

dashed and solid lines to indicate loss and gain, instead of red and green. In addition,

this representation is very useful in the process of assembling multiple views explained

in Section 4.2.5. The thickness of the edges in the Rank-change graph represents the

magnitude of weight change. With links of varying thickness, one can easily spot links

with high losses or gains. In addition to varying the edge thickness, the size of the

nodes varies based on the amount of weight change of edges and the number of such

edges adjacent to it. This scaling of nodes helps to identify ASes with high routing

32

Figure 4.2: Sample Rank-change graph

activity.

We use the JUNG visualization library [28] to construct the Rank-change graph.

Link-Rank uses the spring layout implementation from the JUNG library, which gives

satisfactory results in general. Furthermore, the layout implementation also allows

one to manually reposition any node as needed for clearer view. In most cases when

the Rank-change graphs were sparse, the users of Link-Rank were satisfied with the

default layout. With denser graphs, the users tended to reposition some nodes.

4.2.2 Activity plots: summarizing weight changes

Activity plots summarize routing changes represented by Rank-change graphs along

the time dimension. An activity plot is a series of red and green bars on alternate sides

of a horizontal axis of time. With an activity plot, a user can identify time periods of

33

high routing activity and then investigate those specific periods in more detail. We first

explain how a single activity bar is plotted. Figure 4.3 shows a Rank-change graph

similar to Figure 3.1. Given a Rank-change graph, we first find the total gain and

total loss by adding the weight changes of the green and red links respectively. In this

case, the total rank gain is 200 (100 each on links (44, 33) and (33, 55)) and the total

rank loss is 100. We plot red and green bars proportional to the total loss and gain

respectively as shown in Figure 4.3. In this case, the green bar is longer than the red

bar. A higher gain (green) than loss (red) could be due to a combination of longer new

paths as in Figure 4.3 and new routes being announced.

Activity bars can provide summary information about the routing change. For

example, if we only see a red bar, it signifies that routes have been lost entirely and

this means some set of prefixes are not reachable. 1 In an activity plot, one activity

bar is constructed for each Rank-change graph over the duration of the activity plot.

The total magnitude of the activity bar could vary a lot depending on the type of event,

and we adjust the scale for the Y-axis, where the highest magnitude in any interval

coincides with the tallest bar on the activity plot and the remaining bars scaled linearly

relative to this. In Section 4.3, using case studies, we illustrate how activity plots can

help in the identification of routing problems.

4.2.3 Time Windows and Drilling Down

The time window control in Link-Rank allows users to aggregate Rank-change graphs

in a time interval. Due to the presence of slow convergence [15], some short lived

invalid paths could appear as genuine route changes. With the time-window control,

one can increase or decrease the longevity of weight changes that one wants to visual-

1There are cases where a red bar and absence of green bar may not reflect prefix loss. E.g. if thepaths for a set of prefixes change from A→ B → C to A→ B because the prefixes are now originatedby B, link (B, C) loses ranks but prefixes may still be reachable.

34

Sum of all gains= +200Sum of all losses= -100

200

100

44

55

33-100

+100

+100

Figure 4.3: Plotting an activity bar

ize. Figure 4.4a shows three activity bars corresponding to three Rank-change graphs

shown below. In Figure 4.4b we show the time window by rectangular boxes on the

activity plot. This time window can slide along the activity graph using DVD playback

like controls. In Figure 4.4b, we show how the Rank-change graph looks in three cases,

two involving the same time window size but different positions, and one involving an

even wider time window size. At each position of the time window, the Rank-change

graphs falling in that window are combined into one by taking the union of all the

Rank-change graphs. Equivalently, the Rank-change graph for a specific position of

the time window can also be constructed as a difference graph between the Link-Rank

graphs at the start and end of the time window. Note that within the first time win-

dow t1, the Rank-change graphs have some cancellation effect of route changes, i.e.

net weight change of (44, 55) is −100 + 50 = −50. In contrast, within the second

time window t2, the Rank-change graphs have an additive effect, i.e. weight change of

(44, 55) is 50 + 50 = 100. If the time window is increased to include all three activity

bars as in t3, then all the changes will be cancelled and the net Rank-change graph will

be empty.

Another time control, called the drill-down feature allows one to control the time

granularity of the entire activity plot. By drilling down, one can expand the activity

inside the current time-window to a larger time-span in a new window. The first part of

35

44

55

33-100

+100

+100

44

55

3344

55

33+50

-50

-50

44

55

33-50

+50

+50

a. Rank-change graphs

44

55

33+100

-100

-100

b. Rank-change graphs with different time-windows

Empty Rank-change graph

t1t2

t3

-50

-50

+50

Figure 4.4: Use of time window to control time of change

36

Jan 16, 2005 08:00 UTC to Jan 24, 2005, 16:00 UTC

Jan 22, 2005 23:10 UTC to Jan 23, 2005 15:57 UTC

Jan 23, 2005 07:53 UTC to Jan 23, 2005 09:53 UTC

Figure 4.5: Drilling down to increase level of detail in activity

Figure 4.5 shows an activity plot spanning over 8 days and time window of 16 hours.

To better understand the activity inside the time window, we drill down to expand the

16 hour time window to the activity time span in middle activity plot in Figure 4.5. The

time window in this case is about 2 hours. Drilling down further on this time window

will expand these two hours further as shown in the last activity plot. One can now see

the individual activity bars in detail compared to the first activity plot. Note, given an

activity plot, one can drill down to the granularity of the time equal to the event timer

explained in Section 4.1.

4.2.4 Pruning Rank-change graphs

Link-Rank processes BGP updates and visualizes the links that have changed. In all

the examples in this chapter, the underlying network consists of the Internet with about

20,000 nodes. However, the size of the Rank-change graph depends on the number of

links that have changed and the magnitude of changes. Hence in some cases where

a small number of links have changed, the Rank-change graphs may contain only a

37

small number of nodes and links. In other cases with a lot of changes, the Rank-

change graph may contain hundreds of nodes, making it difficult to extract information

visually. Link-Rank allows a user to prune Rank-change graphs using different filtering

techniques to reduce the complexity of the graph. One technique to prune the graph

by using an output filter in the form of a threshold filter to remove edges with weight

change value less than a threshold value set by the user. Other types of filters include

viewing the top N links with highest weight change values, and view links adjacent

on a set of user specified AS. One can also use a combination of all these filters and

specify the order in which filters are applied.

4.2.5 Assembled View: Merging Rank-change graphs from multiple observa-

tion points

Link-Rank views from multiple observation points can be assembled in a single Rank-

change graph. Figure 4.6 shows the assembled view from two observation points AS

11608 and AS 3561. Note, here we have to use the dashed and solid lines to indicate

lost and gained routes. Edges in this example are either blue or pink, blue indicating

the changes from AS 11608, while pink indicating the changes from AS 3561. In

general, in assembled views, each observation point and its changes are represented by

a unique color. With assembled views, one can identify common segments of change

in Rank-change graphs across different observation points and narrow down on the

possible cause of the routing changes.

4.3 Discovery and analysis using Link-Rank

In this section, we use examples to show how Link-Rank can be used to discover and

analyze routing events.

38

Figure 4.6: Assembling views from AS 11608 and AS 3561

4.3.1 Methodology

Our objective is to evaluate how Link-Rank can help network operators discover and

diagnose routing problems. In terms of routing data, network operators have access to

BGP routing tables and update messages received at their routers. We have access to

similar data from the public archives of the RouteViews Oregon collector that contains

routing tables and updates from about 40 routers belonging to different autonomous

systems. In order to understand how network operators diagnose problems, we inter-

acted with network administrators through email and personal interviews at various

North American Network Operator Group’s meetings [1]. Our pool of interviewees

consisted of about 40 operators from both small and big ISPs, most of them having

more than 5 years of experience in network operations. In the rest of this section, we

use the knowledge gained from this interaction to analyze three case studies from the

perspective of an operator using Link-Rank.

We used three ways to select observation points and time periods for case studies.

First, we looked at activity plots from all observation points on a weekly basis and

39

identified the periods with dense activity or spikes. Case I is an example of this, where

we saw some heavy activity from a particular observation point. Second, we looked at

activity plots to find common activity spikes across multiple observation points during

the same time period. Case II is an example where activity plots from multiple observa-

tion points show spikes at around the same time. Cases I and II show that activity plots

can serve as summaries for network operators using Link-Rank. Finally, we picked

case studies in response to reports of routing or traffic problems from external sources

such as North American Network Operators Group (NANOG) mailing lists. Case III is

representative of this category where there were reports of traffic problems from a few

ISPs. In each of these cases, we used the Rank-change graphs during the selected time

periods, and in one case assembled multiple views together, to understand the routing

activity.

4.3.2 Case I: Capturing Link Instabilities

Around March 2005, AS 7018 showed a lot of heavy activity as shown in the second

activity plot (router IP 12.0.1.63) in Figure 4.7 showing activity for a period of one

week. One task of the network operator is to find out whether this activity is because

of a problem within AS 7018 or a problem beyond AS 7018. Another question to be

answered is whether the entire activity is due to the same event or different events. We

drilled down the activity from one week to a one hour period on March 9, 2005 shown

in Figure 4.8. Note from Figure 4.8 showing activity over one hour, that a Rank-change

graph was generated almost every minute.

We then looked at the Rank-change graphs in this period and found a common

sequence of changes. Figure 4.9 shows a typical sequence of Rank-change graphs

we found, with the time window set to 1 minute. This Figure shows that 134 routes

switched between the paths 7018 → 80 and 7018 → 1239 → 80. This behavior was

40

Figure 4.7: Activity plots from March 8, 2005 to March 14, 2005

Figure 4.8: One hour of activity plot from 12.0.1.63 on March 9, 2005

observed for almost three weeks in March 2005. Next step was to find out the preferred

path among the two oscillating paths. From examination of routing tables before the

event, we saw that the preferred path to reach AS 80 was the direct link (7018, 80).

Since the weight of the link (7018, 80) on the preferred path repeatedly touched 0, it

seemed likely that the link between AS 7018 and AS 80 went up and down repeatedly

and was the cause of the instability seen.

Events such as the constant route change above may result in longer delays as well

as possible packet losses. Yet, they often go unnoticed. In this case, the behavior

continued for almost 3 weeks in March 2005 contributing hundreds of thousands of

BGP updates seen at the observation point. A network operator using Link-Rank at AS

7018 would benefit from the quick identification of such oscillations and bring stability

to routes as well as reduce the number of BGP updates in the Internet drastically. In our

41

a. From: Tue Mar 15 00:21:15 GMT 2005To: Tue Mar 15 00:21:45 GMT 2005

b. From: Tue Mar 15 00:22:15 GMT 2005To: Tue Mar 15 00:22:45 GMT 2005

c. From: Tue Mar 15 00:23:15 GMT 2005To: Tue Mar 15 00:23:45 GMT 2005

Figure 4.9: Case I: Continuous switching of routes between two links

examination over other time periods, we found quite a few instances of link instabilities

similar to this case above.

Summary: Densely clustered bars in activity plots, especially where they have near

constant height are almost always a strong indication of link instabilities. Activity

plots are useful in spotting such cases. One can then examine these time periods in

detail to figure out the actual causes of the rapid route changes.

4.3.3 Case II: Root-cause identification

Root cause identification involves inferring the cause of an observed set of routing up-

dates. For Case II, we picked a case where activity plots of many observation points

showed spikes around the same time. Figure 4.10 shows the activity plot of a few

observation points from October 18, 2005 to October 24, 2005. One can easily spot

spikes and dense activity in these plots from multiple observation points( around Oc-

tober 21, 2005). To understand the causes, we looked at the routing activity from AS

6453 (router IP 195.219.96.239) which generated the first activity plot in Figure 4.10.

Starting from an entire day’s activity, we drilled down to a four hour period between

4:00 and 9:00 GMT on October 21, 2005 that contains the dense activity. Figure 4.11

shows this Rank-change graph around 06:20 GMT on October 21, 2005 from AS 6453

42

with a time window set to 15 minutes. During this time, link (6453, 3356) lost close to

3000 routes (out of a total of around 140,000). At the same time, some other links like

(6453, 701) and (6453, 1239) gained routes. Note, for ease of presentation, we do not

show the link weights and prune the graph by applying the filter to remove links with

changes less than 200. Based on observation, the possible cause is either AS 6453, AS

3356, or the link (6453, 3356).

In this case, since similar activity is also seen from other observation points, one

can benefit by combining multiple observation points into a single assembled view.

Figure 4.12 on page 44 shows the assembled view from three observation points, AS

6453, AS 1239, and AS 3257 that showed similarity in activity plots. In the assembled

view, we use dashed lines to represent route loss and solid lines to represent route gain

and assign each observation point and its corresponding changes, a unique color, e.g.

AS 3257 and its corresponding changes are colored blue. The orange colored nodes

indicate other potential observation points, so more views can be added. Here we select

only three observation points to make the Rank-change graph easy to understand. After

we reduce the time window to 5 minutes, one can see from Figure 4.12, multiple links

to and out of AS 3356 were affected, strongly suggesting some problems inside the

AS 3356 and not just the link between AS 6453 and AS 3356. Our observation was

validated by reports from the NANOG discussion forum that AS 3356 indeed had

some internal problems, and was further corroborated by discussions with network

operators.

Summary: To use Link-Rank for identifying root cause, one can look for high

loss or gain links or nodes which have a high number of outgoing edges with weight

changes. One can also assemble multiple views along the lost or gained path to isolate

sections of the path which might be problematic.

43

Figure 4.10: Activity plots from Octo-

ber 18, 2005 to October 24, 2005

Figure 4.11: Case II: Instability observed at AS

6453

Figure 4.12: Case II: Combined view from AS 1239, AS 6453 and AS 3257

44

CHAPTER 5

Inferring Origin of Internet Routing Problems

In Chapter 4 we showed how visualization of link weight changes can be used to un-

derstand large-scale Internet routing events. In this Chapter, we focus on automated

identification of significant routing events and present a heuristic for inferring the ori-

gin of the event. We design a new inference scheme using the abstract measures of

link weight and weight changes. Link-Rank extracts the total number of routes carried

over each inter-AS link in the Internet topology, called link weight, and measures the

changes in the number of routes on each link to capture aggregate routing changes.

This provides a concise representation of the view from a particular BGP router. We

further leverage our previous observations that, among multiple alternative paths to

a given destination, the most preferred path is used most of the time [32]. As a di-

rect corollary from this observation, each AS link is expected to have a stable weight,

and deviations from this expected value can serve as indications of significant routing

changes.

Once a significant deviation is detected, our objective is to identify the origin of

this deviation. Given the view from a single router, we can use a min-cut heuristic to

identify the most likely faulty AS node or AS-AS link, enabling an isolated BGP router

with only its own update stream to identify significant events and infer the origin of

these events. By correlating changes observed from different monitors, we can achieve

a very high degree of accuracy identifying the AS node or AS-AS link responsible for

triggering the event. This can all be done in near real-time and provides useful tool for

45

network operations and for understanding BGP protocol behavior in the aggregate. We

validated our heuristic by accurately identifying session problems reported by Abilene

with its peers. Our evaluation on events where problem area is adjacent to the origin AS

shows that we could achieve an accuracy of close to 95%. On applying our heuristic

over one month of BGP data, we found various interesting routing instabilities, some

recurring again and again, clearly highlighting the need to be able to identify origin of

changes on a regular basis.

5.1 Characterizing links and identifying routing events

In this section, we look for a way to characterize link weights and identify routing

events. We are particularly interested in identifying the steady value of a link in the

absence of major routing events. If we obtain such a steady value, then we can identify

irregular behavior of links whenever this value changes significantly. To understand

link weight changes, we sampled link weights from a particular observation point every

10 minutes for the entire month of January 2007 resulting in close to 4500 samples .

Out of all the links seen from the observation point, we filtered out links with weight

less than 25, leaving us with 2897 links. Figure 5.1 shows frequency distribution of

link weight values for 3 links randomly picked from different link weight buckets (low

weight, medium weight and heavy weight). Note that the origin in Figure 5.1 has

been shifted and we can see that a majority of the samples fall under a small range

of values. In order to understand the overall trend, we calculated the percentage of

samples falling within 2 standard deviations of the mean link weight for each link.

Figure 5.2 shows the distribution of this percentage of samples for all the links. This

figure shows that over 50% of the links have all samples within 2 standard deviations

of the man link weight. Figure 5.2 also shows that the lowest percentage of samples

within 2 standard deviations is about 80. From this we can see that the link weights of

46

links are more or less steady.

To look for this steady value, we sample the link weight regularly and define an

exponential moving average of link weight called Expected Weight. The expected

weight w̄(l) takes into account the past history of the link weight as well as present

and is computed as

w̄(l) = (1− α) · w̄(l) + α · w(l)

where w(l) is the current weight of the link l, and α decides how much importance is

assigned to the current value compared to the past values.

While some links like that connecting a stub AS to its provider will mostly have

a very stable weight, other links such as ones connecting transit providers vary. Fig-

ure 5.3(a) shows the instantaneous weight of link 7018-1239 as seen from a BGP router

in AS 7018. Note that the Y-range in this graph starts from 23,600. We can see a few

spikes in the instantaneous weight as well as some longer lasting changes (plateaus).

Our approach is to characterize how much link weights fluctuate and look for devia-

tions that exceed some normal value. We capture this deviation using mean deviation

δ(l) = |w̄(l) − w(l)|, where w̄(l) is the expected weight of link l, and w(l) is the

current observed weight of the link. To take into account the history of deviation, we

define an exponential moving average of the mean deviation as

δ(l) = (1− β) · δ(l) + β · |w̄(l)− w(l)|

Our objective in assigning values to the sampling interval T , α and β is to get

an expected behavior that is not influenced by very short lived changes (especially

convergence events), while at the same time be able to adapt to longer lasting changes

quickly enough. We want the value of T to be greater than the convergence time

that has been found to be 2 minutes on average [32] and sometimes be as long as 5

minutes. Based on this, we set the sampling interval T = 10 minutes. To estimate

47

0

500

1000

1500

2000

2500

3000

3500

4000

36 38 40 42 44 46 48

Freq

uenc

y

Link weight

Link 3356-1290

0

50

100

150

200

250

300

2000 2200 2400 2600 2800

Freq

uenc

y

Link weight

Link 2914-4323

0

10

20

30

40

50

60

10900 11000 11100 11200 11300 11400 11500 11600

Freq

uenc

y

Link weight

Link 2914-209

Figure 5.1: Frequency distribution of link weight values of 3 links from a single ob-

servation point

48

0

20

40

60

80

100

0 500 1000 1500 2000 2500 3000

Per

cent

with

in 2

σ

Link ID

Figure 5.2: Percentage samples covered by the top 5 most common values see in the

4500 link weight samples for each link

a good value of α for our choice of T = 10 minutes, we picked a few random links

from the set of heavy weight links, medium weight links and low weight links. We

plotted the instantaneous samples for all these links over a 10 day period. We then

plotted the expected weight w̄(l) for different values of α ranging from 0.1 to 0.9.

From our observations, α = 0.1 provided a close approximation of the link weights

while not being affected by the short lived spikes. Figure 5.3(b) shows the expected

weight for (7018,1239) with α = 0.1, and we can see that the curve closely follows

the instantaneous values except for the short lived spikes. Following similar studies

for deviation, we picked β = 0.25.

5.1.1 Link Events

With an expectation of the weight of a link, we now define a link event as a significant

weight change of a link. The mean deviation δ(l) provides an estimate of how much a

link weight fluctuates. We define a change as significant when the weight of changes

by more than the mean deviation. In other words, we associate the start of the event

49

23600

23800

24000

24200

24400

24600

24800

25000

0 50 100 150 200 250 300 350 400 450

Inst

anan

eous

Wei

ght

Sample ID

Link 7018-1239

(a) Instantaneous Link Weight

23600

23800

24000

24200

24400

24600

24800

25000

0 50 100 150 200 250 300 350 400 450

Exp

ecte

d W

eigh

t

Sample ID

alpha=0.1

(b) Expected Weight

Figure 5.3: Example showing choice of α = 0.1

50

as the time when the current weight changes by more 2 ∗ δ(l). At this point, we need

to identify the end of the event when the link stabilizes again. We mark the end of the

event by using a fixed timeout of t = 3 minutes, being slightly more conservative than

the average convergence time for events of around 2 minutes [32]. When viewing all

links together, an event starts when any link changes by more than the deviation range,

and ends t = 3 minutes later.

Note it is possible that links with very small weights (e.g. w̄(l) = 5) can result in

a link event when even 1 route changes. To account for this, we log all link weight

changes and filter out very small weight changes (.e.g < 25 routes). Also due to

our choice of fixed timeout of t = 3 minutes, it is possible for a single big event

(e.g. 50,000 route changes) to be broken into more than one event. In Section 5.4

we show that it is still possible to accurately diagnose cases where a single routing

event is broken into multiple events. Figure 5.4 shows the number of events identified

by different observation point from RouteViews’ Oregon collector for the month of

January 2007.

As our later evaluation has shown, this technique has proven practically effective

in identifying routing events at the aggregate level. We discuss some potential problem

areas of using link weight based events in Section 5.4. Having identified an event, we

now turn to the main challenge of inferring the origin of this event.

5.2 The Inference Scheme

We now present our heuristic to identify the origin of routing events. We first present

an overview of our approach and then go into the details.

51

0

1000

2000

3000

4000

5000

6000

7000

0 5 10 15 20 25 30 35 40 45

Num

ber o

f eve

nts

(Jan

200

7)

Observation Point ID

Figure 5.4: Events in January 2007

5.2.1 Overview of approach

After we apply the event identification scheme described in Section 5.1, we get a set

of routing events. Each event contains the time of the event and the set of links along

with the weight of the link at the end of the event, and the change in weight. An event

may be a failure event where a bunch of preferred routes are lost, or may be a recovery

where previously lost preferred routes are available again. If the event is a failure then

the origin of change is contained in the set of links that have lost routes, while if its

a recovery event, then the origin of change is contained in the set of links that gained

routes. At the first stage we do not make any classification of the event as a failure or

recovery and identify origin of change for both possibilities.

For the set of edges that lost routes, we correlate the changes across different links

involved and construct a flow graph called fault graph with an artificial source S and

an artificial sink T . The idea behind constructing a fault graph is that any cut (set of

edges) disconnecting S and T represents a possible explanation for origin of change.

Further, if each edge were equally likely to fail, a cut involving the least number of

edges, or min-cut is most likely to be the origin of the change. By augmenting the

52

weight tracking & event

identificationBGP

updates

constructfault graph recovery

Candidate set

reduction

+ve changes origin of

change

constructfault graph

failure

-ve changes

min-cut edges

min-cut

edges

Figure 5.5: Main steps in inference

fault graph with information from other available observation points, we argue that the

min-cut on the fault graph is the most likely explanation. We repeat the procedure of

constructing a fault graph and finding the min-cut for the set of edges that gained routes

as well. Finally, given two possible explanations, one with edges that lost routes, and

one with edges that gained routes, we use information about expected weights and

variance to understand which explanation is more likely. Figure 5.5 shows the main

steps in our process.

5.2.2 Fault graph

We now go into details of the construction of fault graph. For each event, we construct

two graphs, one is the loss graph involving all links that lost weight, and the other is the

gain graph involving all links that gained weight. At this stage we do not know whether

the event is a failure or a recovery and there is greater benefit in adding information

from other monitoring points first. Algorithm 4 details the construction of a fault graph

for positive and negative weight edges for each event.

The main idea in constructing the fault graph is to connect the source node S to

all the nodes that have only outgoing edges, and the sink node T to all the nodes that

have only incoming changes. Figure 5.6 shows a fault graph constructed from a single

observation point for an event that occured on March 9, 2007 at 18:05. One can see

53

Algorithm 4: Fault-Graph(E)Construct Gst as union of all negative (or positive) edges in E;

Assign w(e) = 1 for all e ∈ E;

/* Find nodes with no incoming changes */

for each node n such that (x, n) /∈ E for any x do

add edge (s, n) to Gst with w(s, n) =∞ ;

/* Find nodes with no outgoing changes */

for each node n such that (n, x) /∈ E for any x do

add edge (n, t) to Gst with w(n, t) =∞ ;

here that node 11537 is not connected to either S or T , since it has both incoming and

outgoing edges. In reality, our implementation takes into account the total incoming

weight change and the total outgoing weight change in making a decision on whether

a node connects to S or T . For example, if a node B has total incoming change of

-500 i.e. w(A,B) = −500 but an outgoing change of only -10, i.e. w(B,C) = −10,

then the sink of the flow has to be at node B, due to the discrepancy in the incoming

and outgoing changes. The enclosed table shows the link weights and changes for the

three links included in the fault graph. In the fault graph, any of the three edges could

be cut to obtain a possible explanation, but generally, as you go farther away from the

source S, the edge weight decreases, and with comparable change on a single path, we

try to remove an edge as far away from S as possible. In Figure 5.6, a min-cut on each

graph results in two edges 11537-2018 and 5713-2018 as possible candidates

5.2.3 Augmenting fault graph with views from additional observation points

One may have access to events generated from other observation points as well in

addition to their own BGP data. In practice, one can achieve this by processing events

54

S

11686 19782

2018

11537

T

Link Weight Expected Change Dev

11686-19782 12387 12342 -135 2119782-11537 8495 8658 -135 23 11537-2018 0 137 -137 0

S

11686 19151

2018

5713

T

a. Negative weight change

b. Positive weight change

Link Weight Expected Change Dev

11686-19151 16484 18346 135 335619151-5713 243 120 129 0 5713-2018 136 0 136 0

Figure 5.6: A fault graph from single observation point

S

2914

3549

3216

8732

T

Cut

Link Weight Change

2914-3549 5870 -1723549-3216 0 -613549-8732 0 -85

(a) Min-cut on fault graph from 2914 identifies

incorrect edge

S

2914

3549

3216

8732

T5511

4637Cut

(b) Adding information from other observation

points to fault graph increases accuracy

Figure 5.7: Augmenting a fault graph with additional information

55

from a few peers of public data collectors like RouteViews and RIPE RIS. We are

interested in adding information from other observation points to aid the inference

from our primary observation point. Specifically, we build on the fault graph from the

primary observation point in Algorithm 4.

We first identifying links from other observation points that are common to the

fault graph from primary point. We then identify the links that connect to and from

these common links and add these set of links to the fault graph. Finally, we add

connect nodes that have only outgoing edges to S and nodes that have only incoming

edges to T . Algorithm 5 presents the details.

Algorithm 5: Augmenting the fault graphInput: Gst: The fault graph from primary observation point for time tOutput: G′st: Augmented fault graph for time tfor each monitor Mi do

for event in time interval (t− δ, t+ δ) do

for edge (a, b) common to Gst and Ei do

Ei=edges connecting to and from (a, b);

addToGraph(Gst, Ei);

Figure 5.7(a) shows the fault graph for an event observed from AS 2914. Here, the

min-cut results in edge 2914-3549 as the possible origin of change. Now, adding infor-

mation from 2 other monitoring points as in Algorithm 5 results in the fault graph in

Figure 5.7(b), and one can see that the edges 3549-3216 and 3549-8732 are now iden-

tified as the origin of the change. Adding more observation points, further strengthens

this explanation. Noticing the weights of the edges involved, it seems most likely that

this is the correct explanation.

56

5.2.4 Candidate set reduction

In this final stage, we infer whether the change was a failure or a recovery based on

the candidate sets we have and end up with either the positive change candidate set or

the negative change candidate set as the most likely root cause.

5.2.4.1 Using link state information

We define the following states a link (l) can be based on its current weight w(l), and

expected weight w̄(l) and deviation δ(l).

1. Normal: when w(l) is within w̄(l)± 2 ∗ δ(l)

2. Loss: when w(l) < w̄(l)− 2 ∗ δ(l)

3. Gain: when w(l) > w̄(l) + 2 ∗ δ(l)

The state transitions between these states is shown in Figure 5.8. When a link fails,

its weight drops and as a result it transitions from normal state to loss state, defined as

fail. At the same time, the affected routes would take an alternate path and the weight

for such a link on the alternate path will increase, thus moving it from normal state to

gain state. We define this transition of state from normal to gain as a fail-other. This

increase in rank however will not be due to its own recovery, but due to some other

failure that results in routes using the link as an alternative. Similarly, transition from

loss to normal and gain to normal can be classified as recover and recover-other.

Figure 5.8 also shows transitions between states of loss and gain. Such transitions

(called loss-gain and gain-loss) are not frequent, but are difficult to classify and hence

we do not associate them with recovery or failure. While we find a vast majority of

the transitions to occur between normal and loss, and normal and gain states, a more

detailed study needs to be done in understanding conflicting cases in order to make

57

Normal Loss

Gain

fail

recover

fail-other

recover-othergain-loss

loss-gain

Figure 5.8: State transition diagram

best use of the expected link weights. From the example in Figure 5.6, using the

states defined in Figure 5.8, we classify the edge 11537-2018 as fail, and the edge

5713-2018 as fail other. Thus, the origin lies on the link 11537-2018 and actually

the correct explanation as verified by logs obtained from AS 11537 indicating a BGP

session failure with AS 2018 at that exact time.

5.2.5 Identifying node problems

Even though our scheme uses link weights to capture changes, we can also identify

potential node problems where the origin is an AS instead of particular links. This can

happen due to i-BGP problems within an AS causing instability or due to a BGP router

connected to multiple ASes going down. To identify potential node problems, we first

rank each node by the number of edges adjacent to it in the set of links from candidate

set. The nodes with high rank (e.g. lots of adjacent edges in candidate set) are likely

candidates for node problems. In our analysis, we found a few cases where a lot of

links adjacent to a particular node were identified as origins of change.

58

5.3 Evaluation

In this section we discuss how we validate the results from our scheme. A fundamental

challenge in the validation of a scheme to infer origin of changes is the dearth of

publicly available documentation of BGP session or router failures. We perform two

kinds of validation, one using publicly available information of outages from Abilene,

and the other using BGP updates to identify a class of events where the origin of the

event is almost surely adjacent to the AS announcing the prefix.

5.3.1 Validation using Abilene Data

The Abilene network in United States is a high-performance backbone network created

by the Internet2 community. Abilene maintains a publicly accessible mailing list and

archive containing descriptions of problems inside Abilene, outages, as well as BGP

session problems with its peers. A typical peer unavailable message describes which

BGP router inside Abilene lost connectivity to which AS, and the start and end time of

the outage. However, most of Abilene’s peering with an AS occurs through multiple

physical locations, e.g. Abilene peers with KreoNet through Chicago as well as Seat-

tle. Hence, when one of the BGP sessions, (say the Chicago peering with KreoNet)

goes down, the affected AS (KreoNet) can still be reached using other internal BGP

routers (KreoNet through Seattle). In such cases, Abilene may not announce an AS

path change to its neighbors, and hence such events cannot be used for our validation.

Using peering maps available from Abilene, we identified BGP peers of Abilene that

connected at exactly one location and carried more than 25 prefixes so as to generate

a link weight disturbance. If such a peering session were to go down, then Abilene

would send an AS path change to its neighbors and this event could be seen at other

observation points. We extracted events related to these peers over a 3 month period

59

S

7660 22238

201811537 T

11686

Cut

19782

Figure 5.9: BGP peer TENET (AS 2018) of Abilene (AS 11537) was unreachable.

Event observed from primary view of AS 11686.

from January 1, 2007 to March 31, 2007. We found a total of 7 BGP peering ses-

sion failures in this period from Abilene’s email archives. Out of these events, 6 of

them caused link weight disturbances observed at one or more observation points in

our data, and all 6 of them were identified by our heuristic. Figure 5.6 on page 55

presents the fault graph from one of these 6 cases that was used for validation. Adding

one additional view from AS 7660 gives the fault graph with weight losses shown in

Figure 5.9. Our heuristic puts the min-cut at edge {11537,2018}. From the mailing list

message, Abilene’s peer TENET (AS 2018) lost connectivity to the Abilene network

(AS 11537) at that time, as accurately identified by our scheme.

5.3.2 Validation using Origin-adjacent events

Prior work in inference of origin of change [10] validate results using BGP beacons.

However, we do not use BGP beacons for validation since beacon events are per prefix

and hence are not suited for our link weight based scheme which is tailored for larger

scale disturbances. Instead, we identify large scale events where a set of prefixes orig-

inated by the same AS were unreachable as observed from a majority of observation

points. Given the topological mesh-ness, the most likely reason for unreachability

from diverse set of observation points is that the problem lies adjacent with the origin

AS announcing the prefixes, i.e. the origin itself, its peering with its provider, or its

60

provider. We call such events origin-adjacent events.

5.3.2.1 Collecting origin-adjacent events

To collect a set of origin-adjacent events, we use the clustering based event classifica-

tion scheme [32] based on initial and final paths for each prefix. In particular, we are

interested in the events classified by [32] as Tdown events where a prefix is unreachable

from multiple observation points. We extended this scheme and further correlated the

Tdown events across different prefixes using a fixed sliding window of 30 minutes. All

events happening in the same time window affecting prefixes originated by the same

origin AS are aggregated into a single event. We further removed those events that

were affecting less than 50 prefixes, and from the remaining events, we only consid-

ered those that were observed in more than 50% of the monitors. We applied this

classification scheme on the BGP data collected from RouteViews Oregon collector

over the month of January 2007 to give us the set of origin-adjacent events. Note,

that different observation points might observe slightly different events but any event

must have been observed by at least 50% of the observation points. Figure 5.10 shows

how the number of events involving different minimum number of prefixes for each

observation point 1.

5.3.2.2 Applying link weight based min-cut heuristic

To apply our scheme, we randomly selected 11 out of the 45 observation points we

had data from and identified events based on our link weight based event identification

scheme described in Section 5.1. We then constructed a fault graph for each obser-

vation point individually using Algorithm 4. The min-cut on this fault graph, called

Fsingle, indicated the likely origin of change taking into account only the primary ob-

1We removed two observation points that saw a very small number of events.

61

0

50

100

150

200

250

300

0 5 10 15 20 25 30 35 40 45

Num

ber o

f eve

nts


Affecting >50 prefixesAffecting >75 prefixes

Affecting >100 prefixes

Figure 5.10: Number of origin-adjacent events affecting each observation point

servation point. We then augmented the fault graph with the view from the other

observation points as in Algorithm 5. The min-cut on this fault graph, called Fmult, in-

dicated the likely origin of change by augmenting information from other observation

points to the primary observation point.

5.3.2.3 Results

Figure 5.11 shows the percentage of prefix origin-events-50 accurately identified by

each observation point using its primary view only, i.e. Fsingle, as well as with infor-

mation from other views, i.e. Fmult. The accuracy of event detection using just the

primary view varies over the different observation points. However, we can see that

the accuracy for all observation points is consistently above 90% for Fmult shown in

Figure 5.11. Similarly high accuracy was also obtained for prefix events involving

more than 75 and 100 prefixes respectively. This shows that we can accurately identify

the origin of the event and that adding information from other observation points does

help in increasing the accuracy of identifying the cause of the change.

62

0

20

40

60

80

100

120

0 2 4 6 8 10 12

Acc

urac

y P

erce

ntag

e


Single ViewMultiple Views

Figure 5.11: Accuracy of Fsingle and Fmult for origin events involving more than 50

prefixes

5.3.3 Application to BGP data

We now present the results of application of our scheme to the BGP data over the

month of January 2007. We use the same set of 11 observation points used in Sec-

tion 5.3.2. Each observation point uses the additional views from the other 10 points

in diagnosing its own events. Figure 5.12 shows the number of origins of change (link

instances) over the one month period. Note, a single link may be involved more than

once as an origin of change. Next we examine how each link is classified as per the

state transitions in Figure 5.8. We see that for each observation point, in close to 50%

of the cases, a link instance is classified as a failure, recovery, or backup, i.e. failure-

other or recovery-other. We now investigate the events seen by a BGP router from AS

2914 in more detail.

Figure 5.13 shows the cumulative distribution of number of instances as origin of

event per link. We can see that close to 25% of the links contribute to over 75% of

the origin of change instances. Next we examine the top 10 links in terms of instances

per link and find that most of the top links are links adjacent to AS 2914 or involving

63

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

0 2 4 6 8 10 12

Link

cou

nt


Total Failure/Recovery/Backup

Not classified

Figure 5.12: Links involved in events per observation point

big ISPs. However, the top two most frequently appearing links were 2200-2072 and

13049-2072. These links are two hops away from AS 2914, yet are responsible for

over 400 origin change events each. Figure 5.14 shows the Link-Rank representation

for one such event involving these two links. In this particular case, as viewed from

AS 2914, routes were lost along 2200-2072 and gained along 13049-2072. Over the

one month period, many such events were observed where routes repeatedly switched

between the two paths shown. Clearly, cases like these can be avoided if detected

immediately, instead of recurring over extended periods of time. Next we present

another interesting large scale event from our analysis.

5.3.3.1 Case Study

The case study we present involved lots of routing changes from AS 2914 starting on

Jan 31, 2007 around 7:00 GMT and lasted for about an hour. During this period, our

heuristic identified a lot of links adjacent to AS 2914 as origins of observed routing

change. To understand the event better, we present the Link-Rank graph summarizing

the major links involved in Figure 5.15. As an example, the link between AS 2914

64

0

2000

4000

6000

8000

10000

12000

14000

16000

0 500 1000 1500 2000

Inst

ance

s pe

r lin

k (C

DF)

Link ID

Figure 5.13: Cumulative Distribution of instances per link from AS 2914

Figure 5.14: Repeated instability involving AS 2072 as viewed from AS 2914.

65

Figure 5.15: Case study: Routing changes seen from AS 2914

and AS 3257 (towards the top right) lost 2790 routes and had a final weight of 0. One

can see from the Figure 5.15, that a lot of peers of AS 2914 lost all their routes. After

discussing this event with a network operator at AS 2914, we found out that during

that period, AS 2914 was providing temporary restoration for regional ISPs affected

by the underwater cable cut due to the earthquakes off the coast of Taiwan. One of the

downstream customers of AS 2914 in turn was providing temporary transit to a very

large asian network and hence announced lots of prefixes to AS 2914. As a result, the

max prefix filters of peers of AS 2914 were tripped causing the BGP peering sessions to

go down. These session failures were accurately identified as origins by our heuristic.

5.4 Discussion

In this section we discuss limitations of our inference technique and identify open

issues that deserve further attention. Contrasting with the previous approach to origin

inference by identifying the shared link or node among a large number of routes that

failed at the same time, our link weight based approach does not require per-route

66

information. Link weight changes reflect an aggregate measure of route changes, they

not only bring an essential saving in the data processing but also help improve the

inference accuracy in case of limited data, for example when only data from a single

router is available. 2

As the saying goes, every coin has two sides. While the use of link weights and

weight changes makes the origin inference much faster, it can also introduce potential

inference errors due to the lack of information about individual routes. one possibility

is that some unrelated routing events may result in a link receiving both positive and

negative weight changes simultaneously, and the combined results might lead to a neg-

ligible magnitude of changes, or even mask off the weight changes entirely. Another

possibility concerns correlating observed link weight changes from different vantage

points, there exists a non-zero probability that these changes could be due to simul-

taneous but different routing events, and again the lack of information about specific

routes makes our scheme unable to tell whether that is, or is not, the case.

Nevertheless, our BGP routing measurement efforts over the last several years give

us high confidence that the operational Internet exhibits strong system characteristics,

which we can leverage to identify whether simultaneous routing events occurred. The

expected weight and variance of the AS links represent such a system characteristic

which we explored in our design. Given most routes are longer than one AS hop, the

correlation of the weights and weight changes of adjacent links is another characteristic

that we are yet to gain a full understanding, especially in cases where the affected links

by different routing events partially overlap. As part of our future work, we expect to

gain and utilize this understand to detect simultaneous routing events.

2Unfortunately, due to the unavailability of the data set used in [10], we are unable to quantitativelycompare the processing saving and accuracy improvement of our scheme with the results presentedin [10].

67

CHAPTER 6

Detecting and Alerting about Prefix Hijacks

So far we have discussed techniques to diagnose routing problems in the Internet using

path information from the Border Gateway Protocol (BGP). However, if BGP provides

incorrect routing information, packets may never reach the intended destination, and

may even be misdirected to malicious destinations. The inability to ensure the in-

tegrity and correctness of routing information leads to many known vulnerabilities in

BGP [33]. In this chapter we focus on a type of routing vulnerability called prefix

hijacking that can cause severe security and privacy breaches. In a common prefix

hijacking event, an Autonomous System (AS) originates a route for an address space,

termed as prefix, but does not provide data delivery for that prefix. In other words, an

AS reports “use me to reach prefix p”, but does not actually provide data delivery for

prefix p. For example, on December 24, 2004, AS 9121 incorrectly originated routes

to 106,089 prefixes, almost 70% of all the prefixes at that time. BGP routers through-

out the Internet selected the route originating from AS 9121 as the best path to some

or all of these prefixes. Traffic for these prefixes was then forwarded to AS 9121, who

then essentially dropped the packets, affecting thousands of organizations [34]. When

a prefix is hijacked, sensitive data from unsuspecting users could easily fall into the

wrong hands, resulting in serious security and privacy breaches. A recent study has

also found that spammers hijack BGP prefixes to send spam mail [35]. Thus, prefix hi-

jacking is real operational concern in the Internet, and securing Internet routing against

prefix hijacking is an important problem.

68

In this work, we design a system called Prefix Hijack Alert System (PHAS) that

builds on the premise that the prefix owner is the best person to accurately distinguish

between legitimate changes and prefix hijacking events. PHAS provides prefix owners

with timely and reliable notifications of potential prefix hijacks. During a prefix hijack,

the notification itself may reach the hijacker instead of the prefix owner, and thus the

prefix owner would not be informed of the ongoing hijack. To increase the chances of

notification delivery, we use a multi-path delivery mechanism using the existing email

infrastructure to increase the chances of notification delivery. Our design is readily

deployable and easy to use. Once our system has detected the problem, the owner

can then take necessary actions, including soliciting help through operator channels

like North American Network Operators Group (NANOG) mailing lists, and the NSP-

Security mailing lists to either resolve the problem with the hijacker or its upstream

ISPs.

6.1 Prefix Hijack

In a prefix hijack event, the announced path to the prefix cannot actually be used to

deliver data to the prefix. In some parts of the Internet, the false path replaces the

authentic route to the prefix and traffic that follows the false path will eventually be

dropped or delivered to someone who is pretending to be the legitimate destination.

In other words, the traffic sent along the false path has been hijacked. We term the

AS injecting false information as an attacker AS, and the AS that owns the route as a

victim AS. For example, in Figure 6.1b, AS 110 announces 131.179.0.0/16, while the

true origin for this prefix should be AS 52. It can be seen in this example, that AS 110

successfully effected a hijack, since AS A decided to pick the route to AS 110 instead

of AS 52. In this case, AS 52 is the victim, AS 110 is the attacker and any traffic sent

by AS A is delivered to AS 110 rather than the legitimate origin. Note that AS 52 may

69

see a drop in its overall traffic volume, but variations in traffic load are the norm for

most networks and AS 52 may be completely unaware that hijack event is occurring.

An attacker AS can hijack a prefix in various ways such as falsely announcing

itself as the origin for a prefix (as discussed in the example above), falsely modifying

some portion of the path other than origin, or falsely announcing a more specific prefix.

Our presentation of PHAS is particularly concerned with the first case case where the

origin AS is not valid. Section 6.7 discusses how the PHAS concept can be extended

to handle path modifications adjacent to the origin AS and announcement of more

specific prefixes.

Simply detecting the occurrence of a prefix hijack event is an essential, but difficult

task. Large-scale events where an AS mistakenly hijacks thousands of prefixes may

be detected relatively quickly due to their size and impact. For example, in the AS

9121 event described above, thousands of prefixes from different origins, suddenly

changed to origin AS 9121, a clear indication of prefix hijack. But smaller scale errors

and intentional attacks can be much more difficult to detect. For example, suppose a

malicious AS originates a false path to only one prefix, 131.179.0.0/16 (UCLA). Some

BGP routers will accept the new false path while others may continue to use the correct

path originated by UCLA. An origin change for a single prefix is a common occurrence

and is unlikely to trigger alarm. As we will show later in the chapter, there are quite

a few origin AS changes during a typical day and most of these changes are valid. A

prefix may change its origin AS at any time due to contractual arrangement, multi-

homing, traffic engineering, and a host of other factors. Only the origin itself (UCLA

in our example) could easily and accurately distinguish between a legitimate origin

change and a prefix hijack [18]. The legitimate origin is best able to identify this type

of prefix hijack, but it has very little information about the BGP routes taken by others

to its own prefix. In this case, UCLA may notice a drop in traffic and/or reports of

70

R

52

Q

P

AB

R

52

Q

P

AB

110

Prefix 131.179.0.0/16

Prefix 131.179.0.0/16Prefix

131.179.0.0/16

hijack

attackerAS

a. True origin AS 52 announces prefix131.179.0.0/16

b. False origin AS 110 announces prefix131.179.0.0/16 and hijacks A's route

Figure 6.1: Example of prefix hijack

connectivity problems, but there are numerous potential causes for this. Even if UCLA

suspected a prefix hijacking attack, UCLA’s local data can only confirm that it has

correctly announced its own route. To determine if others are incorrectly announcing

a route to UCLA, the UCLA administrators would need data from other remote sites.

6.2 System Design

One of the goals behind the existing BGP monitoring projects such as Oregon Route-

Views and RIPE RRC is to provide network operators with a remote view of their own

71

prefixes. Through establishing BGP peering with operational routers, the RouteViews

and RIPE RRC projects collect routing data from a few hundred BGP routers around

the globe placed in critical exchange points, tier-1 ISPs, and so forth.1 These BGP data

collectors obtain information in real time, which can be used to quickly detect prefix

hijacking and identify the source of the problem. For example, a prefix hijack event

occurred on January 22, 2006 and affected close to 80 prefixes including a financial

organization. Within a few seconds of the event occurrence, RouteViews data collec-

tor received update messages from several of its BGP peers indicating a new origin

to the prefix of the financial organization. If the prefix owner had received this data,

it could have immediately detected the prefix hijacking and could have quickly taken

corrective measures using operator channels. However, prefix owners do not have any

way to easily access the data. The current BGP monitors collect vast amounts of data

and dump the raw data, unsorted, onto the disk. It is impractical to assume that all the

prefix owners would be able to download this the data and then extract the information

about their own prefixes, let alone doing so in real-time.

Our basic approach is to examine BGP routing data collected at RouteViews (or

RIPE, or any other BGP collectors), and provide real time notifications of any poten-

tial prefix hijacking to the prefix owner in a reliable way. In particular, we should

immediately notify the prefix owner anytime a new origin AS is associated with their

prefix. At a potentially slower rate, the prefix owner should be notified when an origin

AS is no longer used to announce its prefix. The net result is that the prefix owner is

able to track the set of AS numbers that originate its prefix. Presumably, the prefix

owner also knows which AS numbers are allowed to serve as origins and can thus de-

tect any false origins, as well as know when the false origins have stopped announcing

1Admittedly a few hundred routers represent only a small fraction of the overall Internet. A prefixhijack that affects only a small local region may not be observed by any of the current BGP monitors. Ina separate project, we are studying the optimal BGP monitor placement problem, however those resultsare beyond the scope of this chapter.

72

OriginMonitor

NotificationTransmission

LocalNotification

Filter

User RegistrationRouteViews

RIPE RIS

BGPUpdates

originevents notifications

alarms

Prefix Owner

Internet

User Side

Figure 6.2: Components of PHAS

its prefix.

More formally, we define an origin set for each prefix and track changes of this

origin set. Existing BGP monitoring projects such as RouteViews and RIPE, peer with

a few hundred BGP routers around the globe and collect BGP updates in real-time.

Each monitored BGP router, or monitor in short, reports its best path to a prefix P and

the last hop in this path is an origin AS for P. We define the origin set OSET (P, t) for a

prefix P as the union of all the origins seen by all the monitors for that prefix at time t.

For example, on January 22, 2006 before 8:31 hours GMT, all RouteViews monitors

reported paths ending in AS 19758 for prefix 65.173.134.0/24, and thus for time t <

8:31 on 01/22/2006, OSET (65.173.134.0/24, t) = {19758}. When the prefix was

hijacked at 8:31AM, some monitors switched to paths ending with AS 27506 and thus

for the time t = 8:31 on 1/22/2006,OSET (65.173.134.0/24, t) = {19758, 27506}.Our

objective is to immediately notify the owner of 65.173.134.0/24 of this new origin set,

and the owner could then work to resolve this issue with the offending AS 27506 or its

upstream providers. Later, when the origin AS 27506 would not be seen as announcing

this prefix anymore, we would like to send a notification to the prefix owner indicating

that the origin set OSET (65.173.134.0/24, t) = {19758}, so that the prefix owner also

73

knows that the problem has been resolved.

Our design consists of the following four components.

1. User Registration: All prefix owners who are interested in using our system need

to register with the PHAS server and provide contact email addresses. PHAS

aims to provide a web-based registration service, similar to the standard mail-

ing list registration process. Each new user opens an account by his/her email

address and a password via a secure HTTPS session. This action is sent to the

email address for confirmation. Once confirmed, the registration is committed,

and any later change to the account is done via HTTPS and requires the pass-

word. The registration specifies which prefixes are of interest and each registrant

is strongly encouraged to submit multiple email addresses hosted by different

email systems (such as a GMail address), to maximize the chance of email re-

ception in face of prefix hijacking. Ideally, only the legitimate owner of a prefix

should register, but verifying the correct contact address for each prefix is a chal-

lenging problem in its own right with no immediately deployable solution. In

PHAS, an attacker may register and falsely claim to be the prefix owner. How-

ever, this action does not cancel the registration by the legitimate owner and all

notifications are based on publicly available data so the attacker gains no new

information by successfully registering.

2. Origin Set Monitoring: Using the BGP monitor data, PHAS maintains a current

origin set for each registered prefix. If there is a change to this origin set, an ori-

gin event is generated.To control the number of origin events for prefixes with

frequent origin changes, we use a time-window based mechanism to reduce the

repeated reporting of the origin changes but still guarantee the immediate no-

tification for any new origin announced for a prefix. We increase the duration

of this window for prefixes that report lot of origin changes even after the de-

74

fault time window is used. The window duration is decreased if the number of

origin event reduces. This adaptive window scheme is central to ensuring the

system scales from the perspective of the origin set monitoring and also limits

the number of false positives sent to the prefix owners. It is discussed further in

Section 6.3.

3. Notification Transmission: Once an origin event is generated, PHAS decides

whether the origin event translates to a notification message to be sent. For this,

it checks the user registration to see if there are email addresses registered for

the prefix involved. However, the seemingly simple task of sending a notification

message, could be difficult in face of prefix hijack. For example, when the route

to UCLA has been hijacked, email from PHAS to [email protected] may follow

the hijacked route and never reach the intended receiver. To protect against this

case, we strongly recommended two practices for prefix owners in order to set up

“multiple diverse paths” for email delivery. First, in registering with our system,

prefix owners should provide multiple email accounts on different email servers

that are topologically diverse. Second, prefix owners should have Internet access

via multiple prefixes. ISPs often have multiple prefixes of their own. For one that

only owns a single prefix, a backup plan like a dial-up Internet access account is

recommended. With the combination of multiple email addresses and multiple

prefixes for Internet access, prefix owners can achieve a high success rate of

notification delivery even in face of prefix hijack. All notification messages are

also signed by PHAS server, whose public key is well-known. More details on

the notification scheme will be discussed in Section 6.4.

4. Local Notification Filter: Although the notifications could be sent directly to

network administrators, our design assumes an automated processing of the re-

ceived notifications. Tasks such as verifying the message is properly signed,

75

checking whether periodic notifications has been received, and so forth are bet-

ter handled by an automated receiver. In addition, many prefixes have multiple

legitimate origins and thus not every change in the origin set is necessarily an

attack that should be reported to the local network administrators. To make the

system more user friendly, we provide a local filter program for processing the

notification email. The local filter manages the external email addresses, checks

any change in origin against a locally configured set of valid origins, and only re-

ports an alarm to administrator when an unexpected origin change occurs. Local

administrator can easily customize the filter program or even provide their own

filter program. By incorporating a local filter, all the legitimate origin changes

are simply screened out by the filter and only notices requiring human interven-

tion are reported to the network operator. Local notification filter is discussed in

more detail in Section 6.5.

Figure 6.2 shows the four components in our design and the interaction between

them. Note how the origin events translate to notifications and finally to alarms.

6.3 Origin Change Detection

PHAS detects changes in BGP prefix origins and sends notification messages to regis-

tered prefix owners. For traffic engineering purposes, some networks may change their

prefix origins frequently, which may trigger a large volume of notification messages

if we want to keep track of every change. The main challenge in the system design

is how to notify the owner in a timely manner while not being overwhelmed by the

volume of messages.

76

6.3.1 Instantaneous Origin Changes

We first consider a simple scheme (Algorithm 6) that maintains an origin set for each

prefix and sends a notification whenever the origin set changes. It takes input from a

BGP monitoring project such as Route Views or RIPE. Let {M1,M2, ...,Mi, ...,MN}

denote the set of N BGP routers providing data. By observing the BGP updates sent

by router Mi, we can determine Mi’s current route to prefix P . If Mi has a route to

P at time t, origin(Mi, P, t) denotes the origin AS of P in this route. If Mi has no

route to P at time t, origin(Mi, P, t) is empty. The origin set for prefix P at time t

is defined as OSET (P, t) = ∪Ni=1origin(Mi, P, t). In other words, the instantaneous

origin set is simply the union of the origins currently used by any of the monitors to

reach this prefix. As updates from Mi arrive, origin(Mi, P, t) may change and thus

the origin set may change as well. Whenever the origin set changes, we say an origin

event is triggered.

Algorithm 6: Instantaneous Origin ChangeInitialize origin(Mi, P, t0) using the initial routing table of Mi at time t0;

OSET (P, t0) = ∪Ni=1origin(Mi, P, t0);

if update for prefix P at time t from router Mi is an announcement then

origin(Mi, P, t) = the last AS in the announced path;

else

origin(Mi, P, t) = {};

OSET (P, t) = ∪Ni=1origin(Mi, P, t);

if OSET (P, t) 6= OSET (P, t− 1) then

origin gain = OSET (P, t)−OSET (P, t− 1);

origin loss = OSET (P, t− 1)−OSET (P, t);

send [OSET (P, t), origin gain, origin loss] to prefix owner;

77

1

10

100

1000

10000

100000

0 10000 20000 30000 40000 50000

Num

ber o

f orig

in e

vent

s

Prefix ID

Figure 6.3: Origin events per prefix - December 2005

To study the algorithm behavior, we used data for the month of December 2005,

from the RouteViews collector at the Oregon Internet Exchange. This BGP data col-

lector peers with 42 operational routers from around the globe and thus the origin set is

the union of the origin ASes seen by these 42 peers. The number of prefixes involved

is close to 170,000. Algorithm 6 generated 511,513 origin events involving 48,768

prefixes during December 2005. Thus, close to 30% of the prefixes had one or more

origin set changes. Figure 6.3 shows the distribution of the number of origin events

per prefix (prefixes with no origin events are not plotted).

As the figure shows, some prefixes generated a large number of origin events. In

Algorithm 6, even when the same origin leaves the set and comes back again on a re-

peated basis, each appearance and each disappearance triggers an origin event. For ex-

ample, prefix 207.135.82.64/26 generated 5747 origin events during December 2005,

simply due to the fact that its origin set switched frequently between {2828, 65000},

{2828}, and {65000}. Since some prefixes have unstable connectivity to the Internet,

repeated withdrawal and announcement sequence causes the origin to frequently leave

and join the set, resulting in repeated origin events. In order to detect prefix hijack-

78

1

10

100

1000

10000

100000

1e+06

1e+07

0 100000 200000 300000 400000 500000

Inte

r-arri

val t

ime

(sec

onds

)

Number of cases

Figure 6.4: Inter-arrival time between origin events for a prefix for December 2005

ing events, it is essential to immediately notify the owner when a new origin appears.

However, reporting oscillations between already reported origins, as in this particular

example, can be reduced.

6.3.2 Windowed Origin Changes

We now introduce the notion of windowed origin set. We can mask off repeated and

frequent origin changes by reporting observed origin set over some time window, in-

stead of reporting instantaneous origin set changes. Figure 6.4 plots the inter-arrival

time between origin events. From the figure we can see that the inter-arrival time is

less than 1000 seconds in close to 75% of the cases.

Let OSET (P, t, k) denote the set of all the origins for prefix P observed over the

last k time units. In other words, this windowed origin set consists of all the origins

for P that were observed by at least one router Mi during the time [t − k, t]. More

formally, define origin(Mi, P, t, k) = ∪ti=t−korigin(Mi, P, t) and OSET (P, k, t) =

∪Ni=1origin(Mi, P, t, k). The definition includes the last k units at each time and thus

provides a continuously moving window over which the origins of P are recorded. The

79

algorithm to detect origin changes with a moving window is the same as Algorithm 6,

except that we now have to include the time window k and only send origin events

when OSET (P, t− 1, k) 6= OSET (P, t, k).

It is important to note that this revised algorithm only reduces the number of re-

peated origin events. The prefix owner will be immediately notified whenever a new

(potentially false) origin appears for the first time during the last k time units. Suppose

router Mi is the first to observe a new origin O for prefix P . If this new announcement

first appears at time t, Origin(Mi, P, t, k) = O and thus O ∈ OSET (P, t, k). Since Mi

is the first to observe this origin, it must also be the case that O /∈ OSET (P, t − 1, k).

Thus OSET (P, t − 1, k) 6= OSET (P, t, k) and an origin event is triggered at time t,

i.e., as soon as the new origin appears. This feature guarantees timely detection and

notification of potential prefix hijacking.

However, the addition of a time window does delay the notification of origin-loss

events. Suppose origin O was in fact a prefix hijacking attempt. As discussed above,

the prefix owner is immediately notified when O first appears. Assume as a result of

this fast notification, the owner took actions and quickly resolved the attack. Let Mj

denote the last monitored router to remove O from its routing table at time tend. Al-

though O has been removed from the routing tables, it will not be removed from Mj’s

origin set until time tend + k. Thus O is also not removed from Origin(Mi, P, t, k)

until time tend + k. The net result is that the prefix owner is not notified that O has

been removed until k time units after O has vanished from the routing system.

6.3.3 Adaptive Window Size

Our objective is to reduce the number of repeated origin events for prefixes with fre-

quent origin changes, but not penalize well-behaved prefixes by delaying reports that

an origin has been removed. We start with a base time window of one hour. This

80

masks transient changes for most prefixes, at a cost of delaying notification of origin

loss events by one hour. However, some prefixes still generate a large number of no-

tification messages even with the one hour window. Increasing the window size can

further limit the number of repeated origin events for these prefixes but at the cost of

further delaying origin-loss events for other prefixes. Rather than attempt to assign a

uniform time window for all prefixes all the time, we introduce an adaptive window

resizing scheme for each prefix. Essentially, prefixes that generate a large number of

messages will be penalized by large window size, while other prefixes still use small

window size.

Initially, each prefix starts with a penalty value of penalty(P ) = 0 and a time

window of one hour. Anytime a notification is generated for this prefix, penalty(P )

is increased by 0.5. The penalty value decays exponentially over time and the rate of

decay is determined by a half-life parameter. We currently use a half-life of 2 hours2.

The size of the prefix’s time window is set to 2bpenalty(P )c hours. In other words, a prefix

with penalty(P ) < 1 uses a time window of 20 = 1 hour; a prefix with a penalty(P )

in the range [1, 2) uses a time window of 21 = 2 hours; a prefix with penalty(P ) in

range [2, 3) uses a time window of 22 = 4 hours; and so forth.

Figure 6.5 shows the distribution of origin events generated using this adaptive

window. For comparison, we also show the distribution using a fixed window size of 1

hour and show a zoomed in portion of the plot for the top 10 most active prefixes. Fig-

ure 6.6 shows the number of origin events generated per day using adaptive windows

with a default as 1 hour along with the number of origin events using instantaneous

origin changes for comparison. The introduction of the adaptive window reduces the

number of origin events due to unstable prefixes, while still ensuring that any newly

announced origin is immediately reported to the prefix owner. Prefixes that experience

2In other words, the penalty at time t is exactly one half of the penalty at time (t−2) hours, assumingno additional origin events were generated during that time

81

1

10

100

1000

0 5000 10000 15000 20000 25000

Num

ber o

f orig

in e

vent

s

Prefix ID

1 hour windowAdaptive window

150

200

250

300

350

400

0 2 4 6 8 10

Figure 6.5: Distribution of origin events per prefix using adaptive window

large number of origin changes would experience a longer delay before being notified

of origin loss events, but would still receive immediate notification when a new origin

appears.

6.4 Notification Delivery

Once a notification message is generated, it is delivered to the prefix owner’s regis-

tered mailboxes through email. We choose email for delivery, since it is a ubiquitous

delivery method on the Internet and uses TCP, which provides reliable data transfer.

The email body is signed by the monitor to ensure its integrity. There are two types of

messages: event-driven notifications and periodic refreshes. The event-driven notifica-

tions are triggered by origin set changes, and the email contains corresponding origin

gains or losses. For example prefix 60.253.48.0/24, the notification messages look like

the following:

<TYPE=gain, seqnum=1, GMT-TIME=20041221 12:52:33, PREFIX=60.253.48.0/24,

NEW-SET={23918 31050 29257}, ORIGIN-GAINED=29257>

82

0

5000

10000

15000

20000

25000

30000

12/03 12/10 12/17 12/24 12/31

Num

ber o

f orig

in e

vent

s

Time

Instantaneous changesAdaptive Window size

Figure 6.6: Comparison of origin events per day using instantaneous and adaptive

window

<TYPE=loss, seqnum=2, GMT-TIME=20041221 13:52:49, PREFIX=60.253.48.0/24,

NEW-SET={29257 31050}, ORIGIN-LOST=23918>

The periodic notification is sent at fixed time interval (1 day by default), and the

email contains the complete origin set at that moment. The periodic refresh message

is a soft-state mechanism to provide additional system resilience against unforeseen

errors. For instance, even if a notification is lost due to email server crash, the next

refresh message will bring the owner’s knowledge about the origin set up to date.

The major challenge in our system design is how to deliver notifications success-

fully even in the face of prefix hijacks. When a prefix is being hijacked, some data

traffic on the Internet would go to the false origin instead of the true one. If the path

from our server to the prefix owner is diverted to the false origin, then the owner would

not receive the notification at the time when it is needed the most.

Due to the large scale of Internet routing, a prefix hijack is unlikely to affect all

the paths towards the true origin. Thus in delivering the notification messages, our

system uses multiple diversified paths to improve the chances of successful delivery.

83

Ideally, we can send notifications from the monitors that still have path to the old

origin. But this type of email forwarding service is not part of current BGP monitoring

arrangement with commercial ISPs. Requiring email forwarding from monitors would

undermine the deployability of our service. Thus we leave this as an option for future

development, and instead ask prefix owners to take the responsibility of setting up

multiple diversified delivery paths.

There are two practices recommended for prefix owners. First, when registering

with our system, they should provide multiple email accounts on different servers that

are also topologically diverse, for instance popular email services like GMail and Ya-

hoo! mail. Secondly, they should have Internet access through multiple prefixes. ISPs

often have several prefixes themselves, so this should not be a problem. For ones that

only own a single prefix, a backup plan, such as a dial-up Internet access account, is

recommended.

Figure 6.7 shows how the multiple diversified path delivery works. The owner of

prefix P registers four email addresses, one within P, and three others, X, Y, and Z, in

three different networks. Every notification message will be sent to all four mailboxes.

The prefix owner’s local filter will retrieve these four messages, and then process them.

The email body will contain a sequence number, based on which the local filter decides

whether it is a duplicate or is obsolete. Only emails with new contents pass through

the filter and result in an alarm used for hijack detection. When a prefix P is hijacked,

as long as the owner can access one of X, Y, Z, and our server, the notification will

be delivered. Even if all four mailboxes are not accessible directly from the owner

site, as long as the owner can access the Internet through another prefix, he/she can

still retrieve the notification messages regarding the prefix P. The local filter also pe-

riodically polls the mailboxes. In the event that none of them is reachable, it is very

likely that prefix owner’s Internet access has problems, and the filter will generate an

84

Local Notification

filter

Mail server X

Mail server Y

Prefix Owner

Local mailbox Mail

server Z

Local mail server

PHASServer

Figure 6.7: Notification setup

alarm to the operator. In summary, the combination of multiple topologically diver-

sified mailboxes and multiple prefixes used for Internet access, ensures high delivery

rate for notifications.

6.5 Local Notification Filter

PHAS does not associate a prefix with a true origin or false origin, and thus reports

all origin set changes to the prefix owner. However, not all origin set changes may

be of interest to the prefix owner, especially in the event that the origin set changes

frequently. The local notification filter, is an important tuning block at the user side that

enables the prefix owner to filter out unwanted alarms and alert the user for potential

hijacks. In this section, we explain some basic building blocks for constructing filter

rules and use examples to show how simple rules can control the notifications delivered

to the user.

85

6.5.1 Constructing filtering rules

We define a rule to have the form “IF <condition> THEN <action>”. There are two

basic actions possible; ACCEPT results in the message being delivered and REJECT

results in the message being dropped. The default action is ACCEPT, in case no rules

are specified or no rules are fired. The local filter can contain various rules ordered by

preference, and IF clauses can also be nested. While, multiple rules can be listed, for

each notification message, an action of ACCEPT or REJECT can be performed only

once. In other words, once an action is performed, no more rules are matched for that

notification message. Hence, we encourage users to use rules that are simple and easy

to understand and analyze.

To construct rules, we define the following constructs.

1. CONTAINS: defines what a particular key may contain.

2. DIFF: difference between sets.

3. LT, EQ, GT: correspond to the mathematical <,= and >.

4. NOT: negates the construct it follows. E.g. one may use it with CONTAINS.

5. AND, OR: for combinations of conditions.

6. ANY and ALL: used to deal with sets in rules.

Examples

1. A rule specific to a prefix, and checking to see if the new origin is a known

origin:

IF <ORIGIN-GAINED EQ 29257 AND PREFIX EQ 60.253.48.0/24> THEN

REJECT

86

2. A rule asking to drop all origin loss notifications:

IF <TYPE EQ ”loss”> THEN REJECT

Example of a bad Rule

1. A rule that checks for the existence of an AS in the ORIGIN-SET:

IF <ORIGIN-SET CONTAINS 23918> THEN REJECT

In the event of a hijack that changes the origin set from {23918} to {23918, X},

where X is the hijacker, the notification will not be delivered to the user, since the

origin set still contains AS 23918.

6.5.2 Case Study

We now use a case study to show how simple rules can be used to deal with a real

scenario. We choose prefix 60.253.48.0/24 as an example and look at the notifications

from December 21, 2004 to December 28, 2004, when a known prefix hijack event

happened. A sample of the notifications seen by the filter is shown below.

<TYPE=gain, GMT-TIME=20041221 04:44:45, PREFIX=60.253.48.0/24, NEW-

SET={23918, 31050}, ORIGIN-GAINED=31050>


SET={23918, 31050, 29257}, ORIGIN-GAINED=29257>

<TYPE=loss, GMT-TIME=20041221 13:52:49, PREFIX=60.253.48.0/24, NEW-

SET={29257, 31050}, ORIGIN-LOST=23918>


SET= {29257}, ORIGIN-LOST=31050>

For this prefix, we observed three origin ASes: AS 29257, AS 31050 and AS

87

23918. The origin set fluctuated between various combinations of these three ASes

causing notifications to be sent to the owner. Without local filtering, all these legitimate

changes would have resulted in alarms being sent to the prefix owner. However, the

prefix owner, knowing all these three legitimate origin ASes, can set simple rules to

filter out these changes:

IF <ORIGIN-GAINED EQ ANY {23918,31050,29257} > THEN REJECT

IF <ORIGIN-LOST EQ ANY {23918,31050,29257} > THEN REJECT

Note, each notification contains only one value for ORIGIN-GAINED or ORIGIN-

LOST, and hence we can use EQ (equals) clause here. With this rule in place, the prefix

owner would only receive an alarm when the origin changes passes both rules. Around

9:30 AM on Dec 24, 2004, such an alarm happened:


SET={23918 9121}, ORIGIN-GAINED=9121>


SET= {23918}, ORIGIN-LOST=9121>

The first alarm indicates that AS 9121 is now hijacking the prefix 60.253.48.0/24.

The owner knows that this is not a legitimate origin for this prefix, and can then take

appropriate actions. An alarm is also generated to inform the owner that AS 9121

stopped announcing the prefix, indicating the matter has been resolved.

6.6 Evaluation

To evaluate the overhead of the system, we use BGP log data to calculate the number of

origin events generated by the PHAS server, and the number of notifications received

by each AS. We also apply our method to the data collected during known hijack

events to show that PHAS can indeed catch those events. Finally, we use simulations

88

1000

1500

2000

2500

3000

3500

4000

4500

5000

06/11 06/25 07/09 07/23 08/06 08/20

Num

ber o

f orig

in e

vent

s

Time (1 day bins)

Adaptive window

Figure 6.8: Origin events per day from June 1, 2005 to August 31, 2005

to evaluate the success ratio of notification delivery using multi-path delivery scheme.

6.6.1 Notification Messages

Figures 6.8 and 6.9 plot the number of origin events per day over a 6 month period

from June 2005 to November 2005. The origin events generated per day for month

of December 2005 were shown in Figure 6.6 in Section 6.3. Throughout this period,

we observed the number of events captured per day to be around 2000, with a few

occasional spikes. From a system point of view, sending 2000 messages per day is

manageable, even with multiple email delivery.

We now look from users’ point of view to see how many notification messages they

would receive if subscribed to receive PHAS alerts. We treat each origin event as one

notification, assuming all prefixes are registered to receive alerts. For our evaluation,

we use the events generated for the month of December 2005. We first evaluate the

notifications received per prefix. From Figure 6.5 in Section 6.3, we see that only 20K

out of more than 150K prefixes were involved in origin events. Of those 20K prefixes,

almost all of them had less than 10 origin events per month. Only a handful of prefixes

89

0

1000

2000

3000

4000

5000

6000

09/03 09/17 10/01 10/15 10/29 11/12 11/26

Num

ber o

f orig

in e

vent

s

Time (1 day bins)

Adaptive window

Figure 6.9: Origin events per day from September 1, 2005 to November 30, 2005

had more than 100 origin events per month. The worst case being 209.140.24.0/24 with

196 origin events. A closer look at the alarms revealed that the origin set alternated

between {} and {3043}, which indicates the prefix was unstable. From, these numbers

for origin events, one can see that the number of notifications expected per prefix is

quite small, except for some unstable prefixes. For cases of unstable prefixes, the

owner’s local filter will be able to handle such redundant notifications easily.

Since a prefix owner may register multiple prefixes, we also look at number of

notifications expected per AS for the month of December 2005. For evaluation pur-

pose, we estimated the prefixes registered by each AS by using the routing table to

map every prefix to its origin AS. Figure 6.10 shows the number of origin events per

AS for December 2005. Only about 3.5K ASes out of the total 18K ASes received no-

tifications. Of those ASes that received notifications, 97% of them received less than

100 notifications in the entire month. The worst case was AS 29257, receiving 2501

notifications, with the OSET (P ) fluctuating between combinations of 4 origins. These

numbers for origin events per AS indicate that in most cases, an AS would receive a

small number of notifications, and in extreme cases, local filters can once again deal

90

1

10

100

1000

10000

0 500 1000 1500 2000 2500 3000 3500 4000

Num

ber o

f orig

in e

vent

s

AS ID

Figure 6.10: Distribution of events per AS for December 2005

with the common pattern of notifications. All of the above results show that the load of

notification generation, transmission, and processing are easily manageable by a single

machine, even when all the prefixes are registered with PHAS.

6.6.2 Detecting Known Events

We now check if our system would have caught some known prefix hijack events. One

such prefix hijack occurred on May 7, 2005 when AS 174 hijacked one of Google’s

prefixes, 64.233.161.0/24, causing Google to be unreachable during this time. When

run over this period of time, PHAS caught this origin set change and indicated AS 174

as the origin gained during this event.

A larger scale hijack event occurred on Dec 24, 2005. AS 9121 announced itself

as origin to over 106K prefixes. PHAS detected 106082 unique prefixes with origin

9121 added to its origin set and a total of 217884 origin events. Most prefixes had

2 notifications, one reporting the addition of AS 9121, and the other reporting the

removal of AS 9121.

Another case of hijack occurred on Jan 22, 2006, when AS 27506 announced itself

91

as origin to some other’s prefixes. For this day we detected 41 unique prefixes with

AS 27506 as a new origin, and a total of 141 origin change events. For some prefixes,

the AS 27506 was announced as origin, then withdrawn, and then re-announced and

withdrawn again resulting in multiple origin events.

Overall, PHAS successfully caught every known prefix hijack due to false origin

in a timely manner, and the timing matched reports from other sources.

6.6.3 Notification Delivery

To have multiple diverse paths for notification delivery, we recommend the prefix own-

ers register multiple mailboxes and have multiple prefixes for Internet access. If they

do have multiple prefixes, they can always receive the notification messages assuming

only one is hijacked. In this subsection, we evaluate the effectiveness of using multiple

mailboxes through simulations on Internet topology.

The approach is to take an Internet AS graph as the topology, tag each link with

inferred relationship, assume the widely adopted “no-valley” routing policy on every

node, then compute the shortest policy-compliant path between any two nodes. For

each calculation, the input includes one true prefix origin, one false origin, and a set of

mailboxes. Based on the computed shortest paths, we can find out the success ratio of

notification delivery.

The AS Topology is collected from multiple sources, including BGP monitors,

route servers, looking glasses, and routing registry [54]. The AS relationship is in-

ferred using the method in [50]. Two set of mailboxes are used for comparison. The

first set is RouteViews (AS 3582) only, which is called “direct delivery” without other

mailbox. The second set is RouteViews plus GMail (AS 15169), Yahoo Mail (AS

10310), and Hotmail (AS 12076). We randomly picked 276 ASes to form the origin

pairs. They are 15 tier-1 ASes, 21 tier-2 ASes, 20 tier-3 ASes, 20 tier-4 ASes, and 200

92

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

CD

F

Delivery Ratio

Direct Delivery onlyPlus 3 other mailboxes

Figure 6.11: Delivery Ratio

tier-5 ASes. We exhaust all the combinations of origin pairs, a total of 75900 cases.

Given an origin pair, some nodes will take the path to the true origin, while the

others will take the path to the false origin. If a mailbox node takes the path to the true

origin, the prefix owner will be able to access this mailbox and receive the notifica-

tion. Otherwise the notification is lost. Delivery ratio is defined as the percentage of

mailboxes that take the path to the true origin.

Note the simulation results will be symmetric. That is, suppose there is 20% deliv-

ery ratio for a given pair of true origin and a false origin, then it will be 80% when the

role of these two origins switches. Since we exhaust all combinations of origin pairs,

whenever there is a case of a% delivery ratio, there will be a corresponding case of

(1− a%) delivery ratio.

In our path computation, we use random tie-breaking when there’re multiple short-

est paths. For example, if a mailbox has two equal paths, one leads to the true origin,

the other leads to the false origin, we count this as 0.5 notifications from this mailbox

can be delivered.

93

0

10000

20000

30000

40000

50000

60000

70000

0 0.5 1 1.5 2 2.5 3 3.5 4

Cum

ulat

ive

num

ber

of c

ases

Number of successfully delivered notifications

Figure 6.12: Delivery Number

Figure 6.11 compares the delivery ratio of with and without additional mailboxes.

Without the three additional mailboxes, about 30% of notifications will be guaranteed

delivered, about 30% of notifications will be lost for sure, and the rest may be delivered

by certain probability. With the three additional mailboxes, the non-delivery ratio

drops to about 10%.

Figure 6.12 shows the number (not the ratio) of notifications that can get delivered.

In about two thirds (x ≥ 1) of the cases, we have at least one messages are guaranteed

to be delivered. It doubles compared with using only the direct delivery (30%). This

suggests that three additional mailboxes can greatly improve the notification delivery,

but we may still need more mailboxes for higher success ratio.

6.7 Extensions to basic system

So far we have focused on detecting false origins. In this section, we discuss other

ways of hijacking a prefix besides directly announcing a prefix and discuss extensions

to the current system to deal with some of these cases.

94

6.7.1 Classification of Prefix hijack

At the highest level, the attacker AS could target a prefix that is already being an-

nounced by another AS, which we term as valid prefix. The attacker may pretend to

be the owner of this prefix and originate the prefix resulting in a false origin hijack,

that is the focus of this chapter. Another way to hijack a prefix is by announcing a

valid origin, but report invalid path to the origin. For false paths, we separate the case

of false last hop, from false information on any other hop in the path, since the prefix

owner’s AS knows its immediate neighbors, and hence can identify whether the last

hop is valid or not.

An attacker AS may also announce a prefix that is not being announced by another

AS, termed as invalid prefix. If the attacker announces a sub-prefix of some valid pre-

fix, termed as a covered prefix hijack, then routers in the Internet may contain routes

to both the victim AS’s prefix as well as the attacker’s prefix. However, if the des-

tination IP of a packet being routed, falls under the attacker’s prefix space, then due

to longest prefix match, the data would be forwarded to the attacker. An attacker AS

may also announce a less specific prefix than a valid prefix, termed as a covering prefix

hijack but will receive traffic, only when the route to the valid prefix is withdrawn.

For example, if AS 110 announces 131.0.0.0/8, then AS A would route traffic destined

to the valid prefix 131.179.0.0/16, to AS 110 only when the prefix 131.179.0.0/16 is

withdrawn. Finally, an AS may announce an invalid prefix that does not conflict with

any used prefix space. For example, spammers are known to use unused prefixes for

spam purpose. Figure 6.13 shows the classification explained above.

Prefix hijacks could also include combinations of various types in Figure 6.13. E.g.

AS 110 announcing 131.179.0.0/24 (invalid covered prefix) with the path {110, 52}

(invalid last hop). In Figure 6.13, the hijacks in bold (false origin, covered prefix, false

last hop) are the ones where the prefix owner knows of what is legitimate and what

95

Hijack

Valid Prefix Invalid Prefix

false origin false path covered Prefix

covering Prefix

false last hop

false n-th hop

unusedprefix

Figure 6.13: Types of prefix hijacks

may not be, and protection against these attacks is the focus of PHAS. We now discuss

two other sets to deal with covered prefix hijack and false last hop hijack.

6.7.2 Sub-prefix Set

The idea of using a sub-prefix set is to provide the owner of an IP prefix with the

information about whether anybody is announcing a more specific prefix under its

assigned space. This would catch hijacking event where a prefix, say 131.179.96.0/26

is announced by a hijacker AS 100, but the prefix is part of the address space covered

by 131.179.0.0/24, which is owned by AS 52.

For an IP prefix x, some or all of its assigned address space might get further di-

vided into a number of longer prefixes. Each of these prefixes is a known as a covered

prefix of x . The set of all covered prefixes of x observed from the BGP monitors,

is denoted as CP (x). For example, if UCLA announces 131.179.0.0/16 as well as

131.179.96.0/24 and 131.179.59.0/24, thenCP (131.179.0.0/16) is {131.179.96.0/24, 131.179.59.0/24}.

We define a sub-prefix set SPSET (x) to consist of all y ∈ CP(x) such that there

does not exist another prefix z ∈ CP(x) with y ∈ CP (z). In other words, the set

SPSET (x) contains only the first level covered prefix for prefix x.

96

As an example of how this SPSET could be useful, we present a case from Jan 22,

2006. The prefix 208.0.0.0/11, owned by Sprint, generated one origin event at 5:06 am

UTC indicating that the sub-prefix set had changed from {} to {208.28.1.0/24} with

origin {27506}. The prefix in question, 208.28.1.0/24 is not usually seen in the global

routing tables, but in this case AS 27506 announced this prefix, which covers a portion

of Sprint’s 208.0.0.0/11 prefix space, thus resulting in a hijack.

6.7.3 Last Hop Set

The last hop set is maintained with the objective of detecting false last hops in BGP

announcements. Once again, the owner of the prefix would know the legitimate next

hops based on peering agreements and reports of such changes would allow the owner

to detect false last hops in BGP paths.

We define an last hop set LHSET (A) as the set of last hops for all prefixes with AS

A as the origin. For example, if M1 observed a path (7018, 1239, 52) to prefix P1, M2

has a path (3356, 1239, 52) to P2, and M3’s path to P3 is (701, 852,52), then the last

hop set of AS 52, or LHSET (52), is {1239, 852}. Note, that last hop is defined for an

AS, and not for a prefix, since it is reflecting topological connectivity.

The main objective of using the sub-prefix set and the last hop set is to identify

potential hijacks involving more specific address space and last hop changes. However,

the sub-prefix set for large address blocks like 12.0.0.0/8 can be potentially huge, and

may cause lots of dynamics. Similarly, the size of last hop sets for nodes with rich

connectivity (e.g. tier 1 ISP) can also be significant, and may fluctuate a lot. For future

work, we plan to understand the dynamics of these two sets, define how to use these

sets, and include them as a part of the PHAS system.

97

CHAPTER 7

Understanding Resiliency of the Internet against Prefix

Hijack

Prefix hijacking is a serious security threat in the Internet. Prefix hijacks can potentially

be launched from any part of the Internet and can target any prefix belonging to any

network. A hijack attack has a large impact if the majority of routers choose the

path leading to the false origin. Conversely, if the majority of routers choose the path

leading to the true origin, the network of the prefix owner is considered to be resilient

against prefix hijack attacks. Although there have been several results on preventing

prefix hijacks (e.g., [20][31]) and monitoring potential prefix hijack attempts (e.g.,

[24, 37]), there is a lack of a general understanding on the impact of a successful

prefix hijack and networks’ resiliency against such attacks. This lack of understanding

makes it difficult to assess the overall damage once an attack occurs, and to provide

guidance to network operators on how to improve their networks’ resilience.

In this chapter, we conduct a systematic study to gauge the impact of prefix hijacks

launched at different locations in the Internet topology, and identify topological char-

acteristics of those networks that are most resilient against hijacks of their prefixes.

Specifically, we deal with a type of prefix hijack referred to as false origin hijacks

where a network announces the exact prefix announced by another network. Using

simulations on an Internet scale topology and measurements from real data, we esti-

mate how many nodes in the Internet may believe the true origin and how many believe

98

the false origin during a hijack. Our results show that the Internet topology hierarchy

and routing policies play an essential role in determining the impact of a prefix hijack.

Our study shows that the high degree networks (e.g., tier-1 ISPs) are not necessarily

most resilient against prefix hijacks. Instead, small networks that are direct customers

to multiple tier-1 ISPs are seen to be most resilient. Conversely, attacks launched from

these multi-homed customer networks would also have the biggest impact. Implica-

tions of our results are twofold. First, networks that desire high resilience against prefix

hijacks should connect to multiple providers, and be as close as possible to multiple

tier-1 ISPs and networks that cannot achieve such topological connectivity, should use

reactive means to learn about their prefix being hijacked. Second, securing only the

big ISP networks is not adequate nor effective, since high impact attacks come from

well connected small networks.

7.1 Hijack Evaluation Metrics

For our simulations, we model the Internet opology as a graph, in which each node

represents an AS, and each link represents a logical relationship between two neigh-

boring AS nodes. Note, two neighboring odes may have multiple physical links be-

tween themselves. However, BGP paths are represented in the form of AS AS links,

and hence we abstract connections between two AS nodes as a single logical link. For

simplicity, each node owns exactly one unique prefix, i.e. no two nodes announce the

same prefix except during hijack. A prefix hijack at any given time involves only one

hijacker, and the hijacker can target only one node.

99

4

2

1

3

6

5

Provider CustomerPeer Peer

1

1

Tier-1

True origin False origin

1

Figure 7.1: Hijack scenario.

Terminology

In the rest of this chapter, we use the term prefix hijacks to refer to false origin prefix

hijacks. We call the AS announcing a prefix it does not own as the false origin, and

the AS whose prefix is being attacked as the true origin. Upon receiving the routes

from both the false origin as well as the true origin, an AS that believes the false origin

is said to be deceived, while an AS that still routes to the true origin is said to be

unaffected.

To capture the interaction between the entities involved in a hijack, we introduce a

variable β(a, t, v), function of false origin a, true origin t and node v as follows:

β(a, t, v) =

1 : if node v is deceived by false

origin a for true origin t’s prefix

0 : otherwise

(7.1)

Due to the rich connectivity in Internet topology, a node often has multiple equally

good paths to reach the same prefix. Figure 7.1 shows a case where AS-4 has three

equally good paths to reach the same prefix, two to the true origin AS-1 (through AS-2

100

and AS-3), and one to the false origin AS-6. In our model, we assume a node will

break the tie randomly. Therefore, we define the expected value of β as follows. Let

p(v, n) be the number of equally preferred paths (e.g. same policy, same path length)

from the node v to node n. E.g., in Figure 7.1, p(4, 1) = 2 since AS-4 has two paths

via AS-2 and AS-3 to reach AS-1, and p(4, 6) = 1 since AS-4 has only one route via

AS-5 to reach AS-6. If nodes use random tie-break to decide between multiple equally

good preferred paths, then the expected value for β is defined as:

β̄(a, t, v) =p(v, a)

p(v, a) + p(v, t)(7.2)

yielding β̄(6, 1, 4) = 13

for the example in the figure. β̄ is the probability of a node v

being deceived by a given false origin a announcing a route belonging to true origin t.

Impact

We use the term impact to measure the attacking power of a node launching prefix

hijacks. We define impact of a node a as the fraction of the nodes that believe the false

origin a during an attack on true origin t. More formally, the impact of a node a is

given by:

I(a) =∑t∈N

∑v∈N

β̄(a, t, v)

(N − 1)(N − 2)(7.3)

Note that the outer sum is over N −1 true origins (we exclude the false origin) and the

inner sum is over N − 2 nodes (excluding both the false origin and true origin).

Resilience

We use the term resilience to measure the defensive power of a node against hijacks

launched against its prefix. We define the resilience of a node t as the fraction of nodes

that believe the true origin t given an arbritary hijack against t. More formally, the

101

node resilience R(t) of a node t is given by:

R(t) =∑a∈N

∑v∈N

β̄(t, a, v)

(N − 1)(N − 2)(7.4)

Note, higher R(t) values indicate better resilience against hijacks, and higher I(a)

values indicate higher impact as an attacker.

Relation between Impact and Resilience

The true origin t and false origin a compete with each other to make nodes in the

Internet route to itself. For example in Figure 7.1, false origin AS-6 is hijacking a

prefix belonging to true origin AS-1. In this case, only AS-5 believes the false origin

and AS-4 has a 1/3 chance of being deceived. Therefore, the chances that a node

believes the false origin AS-6 when it hijacks AS-1 is given by 1+1/34

= 13.

Now if AS-1 was to hijack a prefix belonging to AS-6, then AS-5 would still believe

AS-6 and AS-4 will believe it with a probability of 1/3. Thus, in this case, the chances

that a node believes the true origin AS-6 when it is hijacked by AS-1 is 1+1/34

= 13.

We see that the resilience of the node as a true origin is equal to its impact as

a false origin. We note that in our model, when the roles of attacker and target are

switched, the impact of a node becomes its resilience. In the rest of the chapter, we

focus on resilience, while keeping in mind that a highly resilient node can also cause

high impact as a false origin.

7.2 Evaluating Hijacks

In this section, we aim to understand the topological resilience of nodes against prefix

hijacks by performing simulations on an Internet derived topology. We first explain

the simulation setup, followed by the main results of our simulation and the insight

102

behind the results.

7.2.1 Simulation Setup

For our simulations, we use an AS topology collected from BGP routing tables and

updates, representing a snapshot of the Internet as of Feb 15 2006 (available from

[53]). The details of how this topology was constructed are described in [54]. Our

topology consists of 22,467 AS nodes and 63,883 links. We assume each AS node

owns and announces a single prefix to its neighbors. We classify AS nodes into three

tiers: Tier-1 nodes, transit nodes, and stub nodes. To choose the set of Tier-1 nodes, we

started with a well known list, and added a few high degree nodes that form a clique

with the existing set. Nodes other than Tier-1s but provide transit service to other AS

nodes, are classified as transit nodes, and the remainder of nodes are classified as stub

nodes. This classification results in 8 Tier-1 nodes, 5,793 transit nodes, and 16,666 stub

nodes. We classify each link as either customer-provider or peer-peer using the PTE

algorithm[11] and use the no valley prefer customer routing policy to infer routing

paths (also used in previous works such as [52]). We abstracted the router decision

process into the following priorities (1)local policy based on relationship, (2)AS path

length, and (3)random tie-breaker.

Of the 22,467 AS nodes in our topology, we randomly picked 1,000 AS nodes

to represent false origins that would launch attacks on other AS nodes. We checked

the degree distribution of this set of 1,000 AS nodes, and found it to be similar to

the degree distribution of all the AS nodes. For each of the 22,467 AS nodes as a

true origin, we simulated a hijack with the 1,000 false origins. Thus we simulated

22, 467× 1, 000 ' 22.5 million hijack scenarios in total.

103

0

0.2

0.4

0.6

0.8

1

0 5000 10000 15000 20000

Res

ilien

cy

Node ID

Avg.Avg. + Dev.Avg. - Dev.

Figure 7.2: Distribution of node resilience.

7.2.2 Characterizing Topological Resilience

Figure 7.2 shows the distribution of the resilience (average curve) for all the nodes

in our topology from our simulated hijacks. Since the resilience of each node results

from the average over 1,000 attackers, we also show the standard deviation range.

Note, higher values of resilience imply more resilience against hijacks.

This distribution shows that node resilience varies fairly linearly except at the two

extremes. Figure 7.2 also shows that the deviations at the two extremes are quite small

compared to the middle, indicating that some nodes(top left) are very resilient against

hijacks, while some others (bottom right) are easily attacked, regardless of the location

of the false origin.

As a first step in understanding how different nodes differ in their resilience, we

classify nodes into the three classes already described: tier-1, transit and stub and

plot the average resilience distribution (CDF) of each class of nodes in Figure 7.3. We

observe that the resilience distribution is very similar for transits and stubs, with transit

nodes being a little more resilient than stubs.

104

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Freq

uenc

y (C

DF)

Node resiliency

Tier-1sTransits

Stubs

Figure 7.3: Resilience of nodes in different tiers.

In contrast, tier-1 nodes show a very different distribution from the stubs and tran-

sits. From Figure 7.3 we observe that all the tier-1 nodes have an average resilience

value between 0.4 and 0.5. In addition, we note that about 40% of stubs and 55%

of transit nodes are more resilient than all tier-1 nodes. With tier-1 nodes being the

ones with the highest degree, it is surprising to see that close to 50% of the nodes

in the Internet are more resilient than tier-1s. Next, we explain why tier-1 nodes are

more vulnerable to hijacks than a lot of other nodes and generalize this explanation to

understand the characteristics impacting resilience.

7.2.3 Factors Affecting Resilience

We first understand the resilience of tier-1 nodes with a simple hijack scenario in Fig-

ure 7.4. AS-2, AS-3, AS-4 and AS-5 represent 4 tier-1 nodes inter-connected through

a peer-peer relationship. AS-1 and AS-6 are small ISPs connected to tier-1 AS nodes

through a customer-provider relationship. Finally AS-7 is a multi-homed customer of

AS-1 and AS-6. In Figure 7.4, AS-7 represents the false origin that hijacks a prefix

belonging to a tier-1 node, AS-4.

105

Recall in no-valley prefer customer policy, a customer route is preferred over a

peer route which in turn is preferred over a provider route. When AS-7 hijack’s AS-

4’s prefix and announces the false route to AS-1 and AS-6, both AS-1 and AS-6 prefer

the hijacked route over the genuine route to AS-4 since its a customer route. AS-1 in

turn announces the hijacked route to its tier-1 providers AS-2 and AS-3. These tier-1

AS nodes, AS-2 and AS-3 now have to choose between a customer route through AS-

1(hijacked route), and a peer route through AS-4 (genuine route). Again due to policy

preference, the tier-1 nodes will choose the customer route which happens to be the

hijacked route. Similarly, AS-5 will also choose the hijacked route. Once big ISPs like

tier-1 nodes are deceived by the hijacker, their huge customer base (many of whom are

single homed) are also deceived, thus causing a high impact. One can see from this

example, that the main reason for the low resilience in the case of a hijack on a tier-1

node is that tier-1 nodes inter-connect through peer-peer relationship thus rendering a

genuine route less preferred to other tier-1 nodes than hijacked routes from customers.

The key to high resilience is to make the tier-1 nodes and other big ISPs always

believe the true origin. The way to achieve this is to reach as many tier-1 nodes as

possible using a provider route. In addition, when a node has to choose between two

routes of the same preference, path length becomes a deciding factor, and thus the

shorter the number of hops to reach the tier-1 nodes, the better the resilience. From

our observations from simulation results, we found that the most resilient nodes are

direct customers of many tier-1 nodes and other big ISPs. As an example, in our

simulations, the node with highest resilience is a stub (AS-6432 DoubleClick) directly

connected to 6 tier-1 nodes, having a resilience value of 0.95. The nodes with lowest

resilience were single-home customers, connected to poorly connected providers.

To better understand the influence of tier-1 nodes, we classified the nodes in the

Internet based on the number of direct tier-1 providers. Figure 7.5 shows the distri-

106

4

2

1

3

6

5

Provider CustomerPeer Peer

Tier-1

7False origin

True origin

4 4

4 4

4

Figure 7.4: Understanding resilience of tier-1 nodes

bution of resilience for nodes with different connectivity to Tier-1. Note, the closer

the curve to the right hand side of the figure (x=1), the better the resilience of that set

of nodes. There are about 21,888 nodes with less than 3 connections to Tier-1, and

we observe in Figure 7.5 that these nodes are the least resilient. A total of 379 nodes

are directly connected to 3 Tier-1s and 104 nodes are connected to 4 Tier-1s. Only

88 nodes are connected to more than 4 Tier-1s, and these nodes prove to be the most

resilient, highlighting the role of connecting to multiple tier-1 nodes.

Summary: In this section, we used an Internet scale topology with no-valley prefer

customer policy routing to evaluate the resilience of nodes against random hijackers.

The key to achieve high resilience is to protect tier-1 nodes and other big ISPs from

being deceived by the hijacker. Our main result shows that the nodes that are direct

customers of multiple tier-1 nodes are the most resilient to hijacks. On the other hand,

the tier-1 nodes themselves in spite of being so well connected, are much less resilient

to hijack. The next question we seek to answer in Section 7.3 is whether there is

evidence of such behavior in reality, where the routing decision process is much more

complex.

107

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Freq

uenc

y (C

DF)

Node resiliency

<3 Tier-1s=3 Tier-1s=4 Tier-1s>4 Tier-1s

Figure 7.5: Resilience of nodes with different number of Tier-1 providers.

7.3 Prefix Hijack Incidents in the Internet

In this section we examine two hijack events, one from January 2006 which affected

a few tens of prefixes, and the other from December 2004 when over 100,000 prefixes

were hijacked. To gauge the impact of the prefix hijacks, we analyzed the BGP routing

data collected by the Oregon collector of the RouteViews project. The Oregon collec-

tor receives BGP updates from over 40 routers. These 40 routers belong to 35 different

AS nodes (a few AS nodes have more than one BGP monitor) and we consider an AS

as deceived by a hijack if at least one BGP monitor from that AS believes the hijacker.

We call these 35 AS nodes as monitors, as they provide BGP monitoring information

to the Oregon collector. The impact of a hijack is then gauged by the ratio of monitors

in the Internet that were deceived.

7.3.1 Case I: Prefix Hijacks by AS-27506

On January 22, 2006, AS-27506 announced a number of prefixes that did not belong

to it. This hijack incident was believed to be due to operational errors, and most of

108

the hijacked prefixes were former customers of AS-27506. We observed a total of

40 prefixes being hijacked by AS-27506. These 40 prefixes belonged to 22 unique

ASes. We present two representative prefixes; for the first prefix the false origin could

only deceive a small number of monitors, while for the second prefix the false origin

deceived the majority of the monitors. We examine the topological connectivity of the

true origins as compared to that of the false origin and the relation to the true origin’s

resiliency.

7.3.1.1 High Resiliency against Hijack

We examine a hijacked prefix that belongs to the true origin AS-20282. The impact

of hijacking this prefix is just over 10%, that is 4 out of the 35 monitored ASes were

deceived by the hijack. Figure 7.6 depicts the connectivity of some of the entities

involved in this hijack incident. The nodes colored in gray are the nodes deceived by

the false origin AS-27506, and the white nodes persisted with the true origin. The

true origin AS-20282 is a direct customer of two tier-1 nodes, AS-701 and AS-3356.

Before the hijack incident, all the 35 monitors used routes containing one of these two

tier-1 ASes as the last hop in the AS path to reach the prefix. The hijacker AS-27506

is a customer of AS-2914, another tier-1 node. When AS-27506 hijacked the prefix,

AS-2914 chose the false customer route from AS-27506 over an existing peer route

through AS-701. The false route was further announced by AS-2914 to other tier-1

peers including AS-701 and AS-3356, however neither of them adopted the new route

because they chose the customer route announced by the true origin AS-20282. Other

tier-1 ASes, such as AS-1239 (not shown in the figure), did not adopt to the false route

from AS-2914 either, most likely because the newly announced false route was 2 hops

in length, the same as that of their existing route through AS-701 or AS-3356, and

the recommended practice suggests to avoid unnecessary best path transitions between

109

701 2914

3356

20282 3130

Tier-1

PeerPeerProvider Customer

true Origin

27506

false origin

X

Figure 7.6: High resiliency against hijack

equal external paths [9]. However we note that AS-3130, who is a customer of both

a deceived and an unaffected tier-1 providers, also got deceived, possibly because the

new path {2914, 27506} is shorter than the original path which contained 3 AS hops.

7.3.1.2 Low Resiliency against Hijack

Next, we examine another hijacked prefix which belonged to AS-23011. The average

impact of this hijacked prefix is 0.6, i.e. 21 out of the 35 monitors were deceived by

the hijack. Figure 7.7 shows the most relevant entities involved in this prefix hijack.

The true origin of this prefix was an indirect customer of 5 tier-1 ASes (not all of them

are shown in the figure) through its direct providers AS-12006 and AS-10910. The

connectivity of the hijacker is the same as before, and AS-2914 was deceived by the

hijack. The 5 tier-1 ASes on the provider path of the true origin stayed with the route

from the true origin AS-23011, however the rest of the tier-1 ASes were deceived this

time, possibly because the peer route to false origin through AS-2914 was shorter than

any other peer route to the true origin. AS-286 is a customer of the providers of both

the true and false origins, and it picked the false route through AS-2914 because it was

110

2914

12006 3130

Tier-1


true Origin

27506

false origin

10910

23011

286

Y

X

Z

Figure 7.7: Low resiliency against hijack

shorter. We note that, in this case, the true origin being indirect customers of multiple

tier-1 ASes ensured that those tier-1 ASes themselves did not get deceived, however

due to its longer distance to reach these tier-1 providers (compared to the true origin

in Figure 7.6, other tier-1 ASes and their customers chose the shorter route to the false

origin.

One of the tier-1 providers that propagated the false route is known to verify the

origin of received routes with the Internet Routing Registries (IRR). However, it did not

block the hijack because the registry entries were outdated and still listed AS-27506

as an origin for the hijacked prefixes, and hence the hijack announcements passed the

registry check.

7.3.2 Case II: Prefix Hijacks by AS-9121

In this hijack incident, operational errors led AS-9121 to falsely announce routes to

over 100,000 prefixes on December 24, 2004. We use this case to evaluate the re-

siliency of tier-1 Ases as compared to that of direct customers of multiple tier-1 ASes.

111

Due to the large number of prefixes being falsely announced, some BGP protection

mechanisms such as prefix filters and maximum prefix limit, where an AS sets an up-

per limit on the number of routes a given neighbor may announce, were triggered and

made an effect on the overall impact. Given that multiple factors were involved in such

a large scale hijack event, it is difficult to accurately model the impact on an AS as a

function of its topological connectivity. Our objective in examining this case is to find

supporting evidence for our observations made in Section 7.2, as opposed to a detailed

study over all the hijacked prefixes. Similar to case-1, we observed how many moni-

tors were deceived for each hijacked prefix and used this result to gauge the resiliency

of the true origin AS.

7.3.2.1 Hijacked Tier-1 AS Prefix

In order to understand how tier-1 ASes fared against AS-9121 hijack, we studied the

impact of those hijacked prefixes that belonged to AS-7018, a tier-1 AS. Note that

AS-7018 announced over 1500 prefixes, and the impacts of different prefixes varied

noticeably, with around 7 to 8 monitors being deceived for most prefixes. For our case

study, we examine one of the hijacked prefixes which deceived the majority of the

monitors. Figure 7.8 shows the entities involved in the hijack of this tier-1 prefix.

The hijacker AS-9121 was connected to 3 providers, one of which was AS-1239,

a tier-1 AS. The true origin of the prefix in question was AS-7018, another tier-1 AS.

The grey nodes in the figure indicate those deceived by the hijack. All the 3 providers

of AS-9121, namely AS-1239, AS-6762, and AS-1299 were deceived into believing

the false origin. AS-1299 also propagated the false route to its tier-1 AS providers.

From our observations, a total of 19 out of 35 monitors were deceived by this hijack.

112

1239

6762

Tier-1


true Origin

9121

false origin

7018

Y

X

1299

Figure 7.8: Tier-1 prefix hijacked

7.3.2.2 Hijacked Prefix belonging to Customer of Tier-1s

Next, we see how the AS-9121 hijack incident affected the prefixes belonging to an

AS that was a direct customer of multiple tier-1 ASes. We picked AS-6461 as an

example here because it connected to all the 8 tier-1 ASes. AS-6461 announced over

100 prefixes, 87 of which were hijacked by AS-9121. No more than 2 monitors were

deceived by the false origin of all the hijacked prefixes. Figure 7.9 shows the entities

involved in the hijack of one of the prefixes belonging to AS-6461. As before, AS-

6762 believed the false origin and was one of the monitors deceived of all the hijacked

prefixes of AS-6461. However, because all the tier-1 ASes were direct providers of

AS-6461, they stayed with the original one-hop customer route to the true origin; in

particular, note that AS-1239 was a provider for both the true origin and the hijacker,

and it stayed with the original correct route. As a result, the hijack of AS-6461’s

prefixes made a very low impact.

In addition to AS-6461, we also studied the impacts of prefixes belonging to a few

113

1239

6762

Tier-1


true Origin

9121

false origin

X

12996461

Figure 7.9: Multi-homed customer of tier-1s hijacked

other transit ASes that were very well connected to tier-1 ASes, and found the impact

pattern for their prefixes to be very similar to the AS-6461 case. To summarize, this

real life hijack event showed strong evidence that direct multi-homing to all or most

tier-1 ASes can greatly increase an AS’s resiliency against prefix hijacks.

7.4 Discussion

It has been long recognized that prefix hijacking can be a serious security threat to the

Internet. Several hijack prevention solutions have been proposed, such as SBGP [20],

so-BGP [31], and more recently the effort in the IETF Secure Inter-Domain Routing

Working Group [2]. These proposed solutions use cryptographic-based origin authen-

tication mechanisms, which require coordinated efforts among a large number of or-

ganizations and thus will take time to get deployed. Meanwhile prefix hijack incidents

occur from time to time and our work provides an assessment of the potential im-

pacts of these incidents. Several hijack detection systems have also been developed,

114

for example MyASN[37] and PHAS[24]. However since these systems are reactive in

nature, it is still important for network customers to understand the relations between

their networks’ topological connectivity and the potential vulnerability in face of prefix

hijacks.

Our simulation and analysis show that AS nodes with large node-degrees (e.g.,

tier-1 networks) are not the most resilient against hijacks of their own prefixes. An AS

can gain high resiliency against prefix hijacks by being direct or indirect customers

of multiple tier-1 providers with the shortest possible AS paths. Conversely, such

customer AS nodes can also make the most impact over the entire Internet, if they

inject false routes into the Internet. This finding suggests that securing the routing

announcements from the major ISPs alone is not effective in curbing a high impact

attack, and that it is even more important to watch the announcements from lower-tier

networks with good topological connectivity.

On the other hand, customer networks that are far away from their indirect tier-1

providers can be greatly affected if their prefixes get hijacked. These topologically

disadvantaged AS nodes are in the most need for investigating other means to pro-

tect themselves. Subscribing to prefix hijack detection systems, such as MyASN and

PHAS, would be helpful. To reduce the transient impact during the detection delay,

one may also look into another proposed solution called PGBGP [19], which is briefly

described in Section 8.

Note that the topological connectivity required for resiliency against prefix hijacks

is different from that required for fast routing convergence [32]. Fast convergence

benefits from fewer alternative paths when the routes change, thus prefixes announced

by tier-1 providers meet the requirement well; while hijack resiliency benefits from

being a direct or indirect customer of a large number of tier-1 providers, thus prefixes

are better hosted by well connected non-tier-1 AS nodes.

115

We would like to end this discussion by stressing the importance of understand-

ing prefix hijack impacts, even when the protection mechanisms are put in place. Our

evaluations on an Internet scale topology in Section 7.2 used a no-valley prefer cus-

tomer routing policy and showed that tier-1 AS nodes are not very resilient to hijacks

of their own prefixes since other tier-1 AS nodes prefer customer routes to false origin.

However, in reality a tier-1 AS may use various mechanisms, such as Internet Routing

Registries (IRR), to check the origin of a prefix before forwarding the route. Such

mechanisms would probably boost the resiliency of tier-1 AS nodes being hijacked.

On the other hand, these protection mechanisms can also fail or backfire, thus expos-

ing the vulnerability of a network. As we saw in case I of Section 7.3, most of the

hijacked prefixes were the former customers of the false origin AS and were recorded

in the Internet Routing Registry (IRR), which was not updated. Outdated registries

resulted in false routes being propagated to the rest of the Internet.

Another example of a protection mechanism is the maximum prefix filter in BGP

that allows an AS to configure the maximum number of routes received from a neigh-

bor. Thus, by limiting the total number of routes received from a neighbor, an AS

can limit the damage in case of the neighbor announcing false routes. In case II from

Section 7.3, AS-9121 announced over 100,000 false routes and one of its neighbors,

AS-1299, had a max prefix set to a relatively low value. AS-1299 believed only 1849

routes directly from AS-9121, but since the max prefix limit is per neighbor, AS-

1299 received hijacked routes from other neighbors as well. It learned a total of over

100,000 bad routes from all the neighbors combined, thus infecting a major portion

of its routing table [34]. These examples show how easily protection mechanisms can

fail due to human errors, underlining the need to understand the impact of hijacks in

face of protection failures, and the need to protect networks by multiple means such as

PGBGP and PHAS.

116

CHAPTER 8

Related Works

8.1 Visualization of Route Dynamics

In the area of visual analysis of Internet routing, BGPlay [5] shows changes in routes

from different monitors to a particular prefix. BGPlay visualizes an update stream and

uses animation to highlight the change in routes. A tool closely related to BGPlay

is ELISHA [46] [47]. This work has a similar flavor to BGPlay and analyzes events

on a per prefix basis. In this scheme, updates on a prefix are sequentially arranged

next to a time line and a line is drawn from the time line to the updates. This helps

in easily identifying effect of the updates clustered close together. One can then delve

into the details of a particular event, by visualizing the path changes in the form of

an arc based representation of links in the routing paths with each AS being assigned

a unique X coordinate. This visualization can help in understanding the updates as

well as detecting routing anomalies. Both BGPlay and ELISHA complement Link-

Rank. Note that BGPlay and ELISHA capture events to a particular destination, while

Link-Rank visualizes aggregate routing changes affecting multiple prefixes. Thus, on

detecting routing problems to a specific prefix using BGPlay or ELISHA, one can use

Link-Rank to see if the problem is related to some link level issues and vice versa.

Other closely related work to Link-Rank is detecting prefix hijacks using visual-

ization [45]. The main difference lies in the fact that [45] provides a visual technique

for detecting abnormality in prefix announcements but does not tell which ASes get

117

affected as shown in detail by Link-Rank. Interesting events exposed by visualization

in [45] could be investigated using Link-Rank to understand the event impact.

8.2 Understanding Routing Dynamics and Problem Inference

Previous work in root cause inference in Internet routing can be broadly sorted into

three categories: automated root cause analysis[21, 10, 7, 49, 6, 39, 51, 17, 43], visu-

alization and human interaction based approaches [5, 26], and theoretical schemes in

model settings[27].

In one of the seminal works about network instability, Labovitz et. al.[23] identi-

fies several causes of routing instabilities in the Internet, without however diagnosing

their topological origin. Later efforts [7, 6, 10, 49] analyze BGP updates and perform

aggregation along three dimensions: time, monitors and prefixes, achieving a final

output in the form of candidate sets of instability origins. Our Min-cut scheme does a

similar aggregation, except that we use link weight aggregates and focus on analyzing

the BGP logs from the viewpoint of a specific monitor, augmented by some views of

other monitors. [21] uses a different approach, since it identifies layers of links with

shared risk and uses membership information to isolate with accuracy failures in the

optical hardware of a network backbone. Feldmann et al.[10] propose a root cause

inference system that aggregates BGP updates according to time, monitors and pre-

fixes (by this order) and uses a greedy heuristic to identify the origin of change. Other

class of works such as [39, 51, 17] diagnose routing changes using anomaly detec-

tion techniques. Roughan et. al.[39] use an EWMA(exponential weighted moving

average) technique in BGP data, and decomposition/Holt-Winters methods in SNMP

data, showing that by increasing the number of monitors, the number of false alarms is

also decreased. [51] uses PCA (principal component analysis) techniques to correlate

different updates into clusters, each cluster being a set of prefixes or ASes which are

118

affected by the same event. Huang et. al.[17] uses the same PCA technique to detect

anomalies inside Abilene network, using multiple Abilene BGP views and routers’

configurations as input. Teixeira et. al.[43] describe a framework to detect the cause

of a routing change using a coordinated diagnostic mechanism among several ISPs,

requiring a special server in each ISP that replies to diagnose queries from other do-

mains. In contrast, our scheme only requires the view from the own ISP and publicly

available views from other ISPs that might be involved in the event.

8.3 Prefix Hijack

Various prefix hijack events have been reported to NANOG [30] mailing list from time

to time. [55] and [29] studied the exact prefix hijack as part of the MOAS (Multiple

Origin AS) problem, in which one prefix has multiple origin ASes in the routing table.

These studies show that one prefix can be legitimately announced by multiple origin

ASes, but can also be hijacked due to mis-configurations.

Existing proposals to address prefix hijack problem can be categorized into two

types: cryptography based, and non-crypto based. Crypto-based solutions, such as

[42], [3],[16], [31], [20], [41], require BGP routers to sign and verify the origin AS

and the path, which have significant impact on router performance. Furthermore, these

solutions are not easily deployable because they all need changes to router software,

and some require public key infrastructures.

Non-crypto proposals include [13], [48], [56], and [19]. IRV approach in [13] lets

each AS designate a server that answers queries regarding BGP security. [48] lets

the router give preference to stable routes over transient ones which can be results of

prefix hijacks. Similarly, in PGBGP [19], a router detects prefix hijacks by monitoring

the origin ASes in BGP announcements for each prefix over time. A transient origin

119

AS of a prefix is considered as anomalous, and router avoids using the anomalous

routes whenever possible. PG-BGP also detects covered prefix hijacks using similar

approach. In [56], prefix owners attach additional information to the routing updates,

so that remote routers could detect prefix hijacks. All the Above non-crypto proposals

require changes to router softwares, router configurations, or the ways that operators

run their networks.

Compared to all of the above proposals, the biggest advantage of our system is

that it is fully deployable. PHAS can be up and running without requiring cooperation

from multiple ISPs, registry authorities, router vendors, or even end users. While

other approaches focus on detecting prefix hijack at remote ASes, we simply notify

the prefix owner about the origin changes, thus allowing the prefix owners to detect

prefix hijacks with a high accuracy.

Three other related works [44, 22, 37] are also fully deployable. [44] utilizes the

data from RouteViews or RIPE and visualizes the origin AS changes of the prefixes

for visual detection of the prefix hijacks. [22] proposes an alarming algorithm for

prefix hijacks and path hijack, based on the the public BGP data, and the geographic

information of the each AS from the whois database. The key observation is that if

two edge ASes are connected to each other or legitimately originate the same prefixes,

they are geographically close. Violation of this observation will trigger alarms.

The RIPE MyASN project [37] is probably the most similar service to ours, but

its design is based on a fundamentally different philosophy. In the MyASN project,

a prefix owner registers the valid origin set for a prefix. MyASN then tracks roughly

the equivalent of our instantaneous origin set for this prefix. An alarm is triggered

when any invalid origin AS appears. Our approach reports the origin set changes to

the prefix owner, and any filtering or checking is done at the user site. This is a subtle

difference, but has important implications.

120

First, filtering at the user side provides the greatest degree of flexibility to the de-

tection algorithm. Users can apply any filtering criteria or detection algorithm on the

data. When the filtering is done at the service site like MyASN, it is limited to what the

service interface could provide. Obviously for security reason, the service site cannot

allow arbitrary filtering script to be uploaded. If prefix owners cannot achieve their

filtering goal at the service site, they have to deploy local filter anyway.

Second, it is critical for the server-based filtering to have the most up-to-date in-

formation needed for prefix hijack detection. The valid origin set must be updated at

MyASN server whenever the prefix has a different origin set. It’s especially hard to

do update in face of an on-going prefix hijack. When a new hijack happens, the prefix

owner may want to change the filtering rule, but is unable to do so due to the attack.

Our approach does not does not suffer from this problem.

In terms of understanding hijacks [4] discusses hijacking and a related concept

of interception where an AS can transparently intercept hijacked traffic and forward

it to the owner. Their study shows that ASes higher in the hierarchy can intercept

a high amount of traffic. This does not directly contradict our result that tier-1 and

other big ISPs are not very resilient, since their estimations are based on which ASes

forward high amount of traffic, while our estimates are based on which routes are more

preferred. Another related measurement is [35] where they show how super prefix

hijacks are correlated with spam sent from the hijacked address space.

121

CHAPTER 9

Conclusions

Due to the sheer size of the global routing infrastructure as well as its dense connectiv-

ity and the resulting complex interactions among the large number of interconnected

networks, understanding global routing changes and inferring the origin of changes

presents a great research challenge.

In this work we proposed to weigh links by number of BGP routes carried and

analyzed Internet routing from the perspective of link weight changes. This approach

is fundamentally different from prior work dealing with individual routes and provides

an aggregate metric to understand routing changes that are related. This enables us

to visualize large scale routing changes by observing the links with large amount of

routing changes. We can also correlate link weight changes and understand the pos-

sible cause of the routing changes. We also proposed a heuristic based on min-cut to

infer most likely location of the changes and showed that this heuristic can achieve a

high degree of accuracy. We also characterize expected weight on each link and use

this information to identify events on links as well as distinguish cases where links fail

versus where links increase routes as backup. Our results show that the use of link

weights and changes can be a promising direction towards routing problem diagnosis

in large scale networks.

Prefix Hijacking is another important problem that the Internet faces. We show that

a simple prefix hijack alert system can effectively reduce the time between initiation

of an attack and the knowledge of this attack. We also showed that the Internet routing

122

policy that was used to commercially benefit big tier-1 ISPs is one of the main factors

that reduces tier-1 ISPs resiliency against prefix hijacking. Our study shows that big

is not always better, and that high impact attacks can be launched from corners of the

Internet. Thus its not enough to secure a few big ISPs to achieve strong security in the

Internet.

9.1 Future Works

We now present some ongoing and future work in the area of Internet routing that can

build on the results from our work.

9.1.1 Understanding AS activity

The stability of Internet routing has been analyzed from the point of view of prefixes

and the AS originating the prefix. However, an AS providing transit for prefixes origi-

nated by others can also have unstable links or internal problems and hence contribute

to routing dynamics. In order to understand the overall stability of Internet routing,

we need to abstract this role of transit AS in routing dynamics. For this purpose, we

propose to use a measure of AS rank-change, an extension of link weight change indi-

cating the amount of path changes an AS is involved in. With this metric, we obtain

information containing the AS rank-change for each AS as observed from all monitors

over time. Such a data set is multi-dimensional and we propose to use a dimension-

ality reduction technique like Principal Component Analysis (PCA) to understand the

data better. By this technique we expect to clearly separate outliers contributing more

routing updates from normal cases and understand the routing dynamics as a whole.

123

9.1.2 Prefix hijack

In Chapter 6 we discussed hijack detection and building an alert system. One of the

problems facing the Internet today is quick recovery from a hijack event. The typical

procedure for recovering from a prefix hijack involves operators from the affected AS

contacting the hijacker AS or its upstream providers and requesting that announce-

ments be stopped. However, this process can take some time, and in the mean time a

hijack can cause a severe impact. One possible direction of work we are investigating

involves building a reaction system wherein certain measures can be taken to quickly

flush out the hijacked routes from the Internet. Such a reaction system can reduce

the negative impact on data delivery while giving network operators enough time to

contact the source of the problem.

In Chapter 7 we showed how routing policies can negatively impact the resiliency

tier-1 ISPs. These ISPs are well connected and can provide superior data-path per-

formance and hence it is important to fix the weak resiliency due to policy decisions.

One line of work involves designing such a fix, so that the commercial interests are not

compromised, yet resiliency against prefix hijacking can increase.

9.1.3 BGP monitoring

A lot of recent research in Internet routing has benefitted greatly from BGP data col-

lected from monitoring projects like RouteViews and RIPE. However, with over 400

BGP monitors and with each monitor collecting close to 400,000 updates per day on

average, this translates to a lot of monitoring data. Researchers often pick a small num-

ber of monitors to evaluate or understand routing problems, but the choice of which

monitors to pick is often done randomly. An interesting line of work involves using

some aggregate metric like link weight changes to cluster monitors so that one can

124

clearly identify monitors that collect similar data. With an understanding of monitor

behavior, one may be able to systematically pick a smaller set of monitors that is more

representative of overall Internet behavior.

125

REFERENCES

[1] North American Network Operators Group (NANOG). [On-line]http://www.nanog.org.

[2] Secure Inter-Domain Routing (SIDR) Working Group.http://www1.ietf.org/html.charters/sidr-charter.html.

[3] W. Aiello, J. Ioannidis, and P. McDaniel. Origin Authentication in InterdomainRouting. In Proceedings of 10th ACM Conference on Computer and Communi-cations Security, pages 165–178. ACM, October 2003. Washington, DC.

[4] Hiitesh Ballani, Paul Francis, and Xinyang Zhang. A Study of Prefix Hijackingand Interception in the Internet . In Proceedings of ACM Sigcomm, 2007.

[5] Giuseppe Di Battista, Federico Mariani, Maurizio Patrignani, and Maurizio Piz-zonia. BGPlay: A system for visualizing the interdomain routing evolution. InGraph Drawing, volume 2912 of Lecture Notes Computer Science, pages 295–306, 2003.

[6] M. Caesar, L. Subramanian, and R. Katz. Root cause analysis of internet routingdynamics. In U.C. Berkeley Technical Report UCB/CSD-04-1302, nov 2003.

[7] D. Chang, R. Govindan, and J. Hiedemann. The temporal and topological char-acterestics of bgp path changes. In ICNP, nov 2003.

[8] E. Dahlhause, D. S. Johnson, C. H. Papadimitriou, P. D. Seymore, and M. Yan-nakakis. The complexity of multiway cuts. In Proceedings of the 24th AnnualACM Symposium on Theory of Computing, pages 241–251, 1992.

[9] S. Sangli E. Chen. Avoid BGP Best Path Transitions from One External to An-other. Internet Draft, IETF, June 2006. http://www.ietf.org/internet-drafts/draft-ietf-idr-avoid-transition-04.txt.

[10] A. FeldMann, Olaf Maennel, Z. Morley Mao, A. Berger, and B. Maggs. LocatingInternet routing instabilities. In Proceedings of Sigcomm, September 2004.

[11] Lixin Gao. On inferring autonomous system relationships in the Internet.ACM/IEEE Transactions on Networking, 9(6):733–745, 2001.

[12] N. Garg, V. Vazirani, and M. Yannakakis. Multiway cuts in node weightedgraphs. Journal of Algorithms, 50(1):49–61, 2004.

126

[13] G. Goodell, W. Aiello, T. Griffin, J. Ioannidis, P. McDaniel, and A. Rubin. Work-ing around BGP: An incremental approach to improving security and accuracyof interdomain routing. In NDSS, 2003.

[14] Timothy Griffin and Gordon T. Wilfong. A safe path vector protocol. In INFO-COM (2), pages 490–499, 2000.

[15] Timothy G. Griffin and Gordon T. Wilfong. An analysis of BGP convergenceproperties. In Proceedings of SIGCOMM, pages 277–288, Cambridge, MA, Au-gust 1999.

[16] Y.-C. Hu, A. Perrig, and M. Sirbu. SPV: Secure path vector routing for securingbgp. In Proceedings of ACM Sigcomm, August 2004.

[17] Y. Huang, N. Feamster, A. Lakhina, and J. Xu. Detecting Network Disruptionswith Network-Wide Analysis . In Proc. of ACM SIGMETRICS, 2007.

[18] Geoff Huston. Auto-detecting hijacked prefixes? a presentation at RIPE-50,May 2005 http://www.potaroo.net/presentations/index.html.

[19] J. Karlin, S. Forrest, and J. Rexford. Pretty good bgp: Protecting bgp by cau-tiously selecting routes. Technical Report TR-CS-2005-37, University of NewMexico, Octber 2005.

[20] S. Kent, C. Lynn, and K. Seo. Secure Border Gateway Protocol. IEEE Journalof Selected Areas in Communications, 18(4), April 2000.

[21] Ramana Rao Kompella, Jennifer Yates, Albert Greenberg, and Alex Snoeren. IPfault localization via risk modeling. In Proceedings of Second ACM/USENIXSymposium on Networked Systems Design and Implementation, 2005.

[22] Christopher Kruegel, Darren Mutz, William Robertson, and Fredrik Valeur.Topology-based detection of anomalous bgp messages. In 6th Symposium onRecent Advances in Intrusion Detection (RAID), 2003.

[23] C. Labovitz, G. R. Malan, and F. Jahanian. Origins of internet routing instability.In Proceedings of the IEEE INFOCOM ’99, pages 218–26, New York, NY, March1999.

[24] Mohit Lad, Dan Massey, Dan Pei, Yiguo Wu, Beichuan Zhang, and Lixia Zhang.PHAS: A prefix hijack alert system. In 15th USENIX Security Symposium, 2006.

[25] Mohit Lad, Dan Massey, and Lixia Zhang. Link-Rank: A Graphical Tool forcapturing BGP Routing Dynamics. In IEEE/IPIF NOMS, April 2004.

127

[26] Mohit Lad, Daniel Massey, and Lixia Zhang. Visualizing Internet routingchanges. In IEEE Transactions on visualization and Computer Graphics, spe-cial issue on visual analytics, to appear, 2006.

[27] Mohit Lad, Akash Nanavati, Dan Massey, and Lixia Zhang. An AlgorithmicApproach to Identifying Link Failures. In PRDC, March 2004.

[28] Joshua Madaadhain, Danyel Fisher, Padhraic Smyth, Scott White, and Yan-BiaoBoey. Analaysis and visualization of network data using JUNG. In Journal ofStatistical Software, to appear.

[29] R. Mahajan, D. Wetherall, and T. Anderson. Understanding BGP misconfigura-tion. In Proceedings of ACM Sigcomm, August 2002.

[30] The NANOG Mailing List. http://www.merit.edu/mail.archives/nanog/.

[31] J. Ng. Extensions to BGP to Support Secure Origin BGP. ftp://ftp-eng.cisco.com/sobgp/drafts/draft-ng-sobgp-bgp-extensions-02.txt, April 2004.

[32] Ricardo Oliviera, Beichuan Zhang, Dan Pei, Rafit Itzak-Ratzin, and Lixia Zhang.Quantifying path exploration in the Internet. In Proceedings of Internet Mea-surement Conference, to appear, 2006.

[33] D. Pei, D. Massey, and L. Zhang. A Framework for Resilient Internet RoutingProtocols. IEEE Network Special Issue on Protection, Restoration, and DisasterRecovery, 2004.

[34] Alin C. Popescu, Brian J. Premore, and Todd Underwood. Anatomy of a leak:As9121. http://www.nanog.org/mtg-0505/underwood.html.

[35] Anirudh Ramachandran and Nick Feamster. Understanding the network-levelbehavior of spammers. In Proceedings of ACM SIGCOMMq, 2006.

[36] Y. Rekhter and T. Li. A border gateway protocol (BGP-4). Request for Comment(RFC): 4271, 2006.

[37] RIPE. Routing information service: myASn System.http://www.ris.ripe.net/myasn.html.

[38] RIPE. Routing Information Service Project. http://www.ripe.net/ripencc/pub-services/np/ris-index.html.

[39] Matthew Roughan, Timothy G. Griffin, Z. Morley Mao, Albert Greenberg, andBrian Freeman. Forwarding anamolies and improving their detection using mul-tiple data sources. In SIGCOMM Workshop: Network Troubleshooting, 2004.

128

[40] routeviews.org. RouteViews Routing Table Archive.http://www.routeviews.org/.

[41] B. R. Smith and J. J. Garcia-Luna-Aceves. Securing the border gateway routingprotocol. In Global Internet’96, November 1996.

[42] L. Subramanian, V. Roth, I. Stoica, S. Shenker, and R. H. Katz. Listen andwhisper: Security mechanisms for bgp. In Proceedings of ACM NDSI 2004,March 2004.

[43] Renata Teixeira and Jennifer Rexford. A measurement framework for pin-pointing routing changes. In Proceedings of the ACM SIGCOMM workshop onNetwork troubleshooting, 2004.

[44] S. T. Teoh, K.-L. Ma, S. F.Wu, D. Massey, X. Zhao, D. Pei, L. Wang,L. Zhang, and R. Bush. Visual-based anomaly detection for bgp origin as change(oasc) events. In IFIP/IEEE DistributedSystems: Operations and Management(DSOM), pages 155–168, October 2003.

[45] Soon Tee Teoh and Kwan-Liu Ma. Case study: Interactive visualization for In-ternet security. In Proceedings of IEEE Visualization, 2002.

[46] Soon Tee Teoh, Kwan-Liu Ma, and S. Felix Wu. A Visual Exploration Processfor the Analysis of Internet Routing Data. In Proc. IEEE Visualization, 2003.

[47] Soon Tee Teoh, Ke Zhang, Shih-Ming Tseng, Kwan-Liu Ma, and S. Felix Wu.Combining visual and automated data mining for near-real-time anomaly detec-tion and analysis in BGP. In VizSEC/DMSEC ’04: Proceedings of the 2004 ACMworkshop on Visualization and data mining for computer security, pages 35–44,2004.

[48] L. Wang, X. Zhao, D. Pei, R. Bush, D. Massey, A. Mankin, S. Wu, and L. Zhang.Protecting BGP Routes to Top Level DNS Servers. In Proceedings of the ICDCS2003, 2003.

[49] Jian Wu, Z. Morley Mao, and Jennifer Rexford. Finding a needle in a haystack:Pinpointing significant BGP routing changes in an IP network. In Proceedingsof 2nd symposium on Networked Systems Design and Implementation (NSDI),2005.

[50] Jianhong Xia and Lixin Gao. On the evaluation of AS relationship inferences. InProc. of IEEE GLOBECOM, December 2004.

129

[51] Kuai Xu, Jaideep Chandrashekar, and Zhi-Li Zhang. A First Step Towards Un-derstanding Inter-domain Routing. In Proc. of ACM SIGCOMM Workshop onMining Network Data, 2005.

[52] Wen Xu and Jennifer Rexford. MIRO: multi-path interdomain routing. In SIG-COMM, pages 171–182, 2006.

[53] Beichuan Zhang, Raymond Liu, Dan Massey, and Lixia Zhang. Internet Topol-ogy Project. http://irl.cs.ucla.edu/topology/.

[54] Beichuan Zhang, Raymond Liu, Daniel Massey, and Lixia Zhang. Collecting theinternet as-level topology. ACM SIGCOMM Computer Communications Review(CCR), 35(1):53–62, January 2005.

[55] X. Zhao, D. Pei, L. Wang, D. Massey, A. Mankin, S. Wu, and L. Zhang. AnAnalysis BGP Multiple Origin AS(MOAS) Conflicts. In Proceedings of the ACMIMW2001, Oct 2001.

[56] X. Zhao, D. Pei, L. Wang, D. Massey, A. Mankin, S. Wu, and L. Zhang. De-tection of Invalid Routing Announcement in the Internet. In Proceedings of theIEEE DSN 2002, June 2002.

130

understanding and diagnosing routing dynamics in global ... · understanding and diagnosing routing...

Documents