a fine-grained distance metric for analyzing internet topology · a fine-grained distance metric...
TRANSCRIPT
A Fine-Grained Distance Metric for Analyzing Internet Topology
Mark Crovella joint with
Gonca Gursun, Natali Ruchansky, and Evimaria Terzi
Distance in Graphs with Low Diameter • Do ‘neighborhoods’ still
exist? • Under what distance
metric?
A Motivating Problem
• Internet routing between domains – Autonomous Systems (ASes)
• Simple question: what paths pass through my network? • Important for network planning, traffic management,
security – If someone at BU were to send an email to UM, would it go
through my network?
Surprisingly hard to answer!
• Routing consists of an AS choosing a next hop for each destination – Destinations are ‘prefixes’
• Decisions are only partially communicated to neighbors – In general, decisions made by a remote AS are not known
Observing Traffic
• An AS can observe the traffic passing through it – If BU sends traffic to UM through Sprint, Sprint knows it
• Traffic only provides positive information – Absence of traffic is ambiguous
• If the observer does not see traffic from I to j, it is either – A true zero: the path from i to j does not go through the observer; or – A false zero: the path goes through, but i is not sending anything to j
The Visibility-Inference Problem
• For each observer there is a ground truth matrix T – T(i,j) = 1 è path from i to j passes through observer
• Traffic summarized in observable matrix M – M(i,j) = 1 è traffic was seen flowing from i to j – M(i,j) = 1 è T(i,j) = 1
• Problem: label the zeros in M as either true or false
• Success metric: Detection Rate, False Alarm Rate – Of true zeros
T =
2
40 1 01 1 01 0 1
3
5 M =
2
40 0 01 0 01 0 1
3
5
Intuition
• Amplify knowledge obtained from traffic observation • Empirically we observe that there are groups of
sources, destinations exhibiting `similar routing’ • Observed traffic provides positive knowledge for entire
group
• Requires a metric that captures the notion that d1 is `close’ to d2 while s1 is `far’ from s2
M =s1
s2
2
40 0 01 0 01 0 1
3
5
d1 d2 s1
s2
d1
d2
Seeking a metric for ‘neighborhoods’
• Typical distance used in graphs is hop count • Not suitable in small worlds
• 90% of prefix pairs have hop distance < 5 – Clearly, typical distance metric is inappropriate
• Need a metric that expresses ‘routed similarly in the Internet’ – or other graph
0 200 400 600 800 10001
2
3
4
5
6
7
Hop
Dis
tanc
e
Prefix Pairs
Capturing global routing state • Conceptually, imagine capturing the entire routing
state of the Internet in a matrix N • N(i,j) = next hop on path from i to j • Each row is actually the routing table of a single AS • Now consider the columns
N =
2
66664
3
77775
Routing State Distance • rsd(a,b) = # of entries that differ in columns a and b of N • If rsd is small, most ASes think the two prefixes are
‘in the same direction’ • A metric (obeys triangle inequality)
rsd=3 rsd=5
N =
2
66664
3
77775N =
2
66664
3
77775
RSD in Practice
• Key observation: we don’t need all of H to obtain a useful metric – That’s good – the problem would be solved anyway if we had that
• Many (most?) nodes contribute little information to RSD – Nodes at edges of network have nearly-constant rows in H
• Sufficient to work with a small set of well-chosen rows of H – Empirically we observe this to be the case
• Such a set is obtainable from publicly available BGP measurements – Note that public BGP measurements require some careful handling to use
properly for computing RSD
0 200 400 600 800 10001
2
3
4
5
6
7
Hop
Dis
tanc
e
Prefix Pairs0 200 400 600 800 1000
0
50
100
150
200
250
RSD
Prefix Pairs
Using RSD to amplify observed traffic
• For each zero (i,j) in M – Find , – Compute – If then (i,j) is a false zero; o/w true zero
• Thresholds τ and β are easy to set in an automatic way • Applied to three sets of ASes:
– Core-100 and Core-1000: centrally located ASes – Edge-1000: randomly chosen ASes, mostly near edge of net
Si = {i0 | rsd(i, i0) < ⌧} Di = {j0 | rsd(j, j0) < ⌧}⇡(i, j) =
PM(Si, Dj)
⇡(i, j) > �
Experimental setup
• Ground-truth matrices from BGP data – Collected all active paths from 38 sources to 135,000
destinations – For every AS, construct 38 x 135,000 ground truth matrix T – Data hygiene: discussed at end
• Simulate traffic absence by setting some 1s to zeros – Flipped at random from 1 to 0 – 10%, 30%, 50%, 95%
Quick Look
Performance
• Accuracy on Edge-1000 is poorer – At edge of network, problem is actually easier – RSD not the right idea to use there
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
False Positive Rate
True
Pos
itive
Rat
e
10%30%50%95%
0 0.05 0.1
0.8
0.9
1
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
False Positive Rate
True
Pos
itive
Rat
e
10%30%50%95%
0 0.05 0.1
0.8
0.9
1
Core-1000 Core-100
Mean TPR and FPR: Core-1000
Flip Rate TPR FPR
10% 0.98 (0.03) 0.03 (0.04)
30% 0.98 (0.03) 0.03 (0.04)
50% 0.98 (0.03) 0.03 (0.05)
95% 0.98 (0.03) 0.21 (0.18)
Properties of RSD
• Tree: rsd = 1 + length of path in tree
– So rsd in a tree is low, generally O(log n)
• Clique: rsd = n
• Star: rsd = 3
On graphs of size n with shortest-path routing
Comparison to Classical Distance
rsd dStar 3 2
Balanced tree log n log nClique n 1
RSD on small worlds
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0.0001 0.001 0.01 0.1 1p
Clustering CoefficientHop Distance
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0.0001 0.001 0.01 0.1 1p
Clustering CoefficientHop Distance
RSD
W&S, Nature, ‘98
RSD remains useful as hop distance breaks down
p = 0 p = 0.01 p = 0.1
d
rsd
RSD as a tool for data analysis and discovery
• RSD applied to Internet prefixes shows low effective rank • Implication: RSD is good for visualization of prefixes
0 5 10 15 200
5
10
15 x 104
20 largest singular values of 1000 x 1000 RSD matrix for Internet prefixes
RSD for clustering – fine scale
−0.04 −0.038 −0.036 −0.034 −0.032 −0.030.02
0.03
0.04
0.05
0.06
0.07
ASN−QWEST
ESNET
GTANET−ASAPPLE
ASN852KREONET−AS−KR
TELE2TELIANET
Rede
ERX−JARINGWISCNET1−ASKANREN
INFOSPHERE
IDC2554
MORENET
SPIRITTEL−AS
XO−AS15
SINET−AS
ASC−NET
HARNET
UIOWA−AS INSINET
LGDACOM
CPCNET−AS4058
ERX−AU−NET
CTM−MO
INET−TH−AS
UNSPECIFIED
ASN−PACIFIC−INTERNET−IX
THAI−GATEWAY
HYUNDAI−KR
IDC
ODN
SAMART−BOARDER−AS
CSLOXINFO−AS−AP
WORLDNET−AS
KIXS−AS−KR
ICONZ−AS
GLOBE−TELECOM−AS
SEEDNET
LINTASARTA−AS−AP
ASN−IINETVOCUS−BACKBONE−AS
IINET−AU
DISC−AS−KR
ICN−AS
GLOBEINTERNET
GT−BELL
BAYAN−TELECOMMUNICATIONSHURRICANEWINDSTREAM
GRANDECOM−AS1
TRUEINTERNET−AS−AP
CSTNET−AS−AP
NETIRD
PI−AU
TPG−INTERNET−AP
HCNSEOCHO−AS−KR
APAN−JP
CTNET
TRANSACT−SDN−AS
VISIONNET
BEZEQ−INTERNATIONAL−AS
CELLCOM−AS
SPEEDCAST−AP
ISP−AS−AP
HCNCHUNGJU−AS−KR
TDNC
NEWTT−IP−AP
CRNET
MULTIMEDIA−AS−AP
DREAMX−AS
SMARTONE−AS−AP
ASN−PLATFORM−AP
AP−AUST−AU−AS
FX−PRIMARY−AS
GENESIS−AP
ZAQ
ASN−OZONLINE−AU
MOPH−TH−AP
FCABLE−AS
TOTNET−TH−AS−AP
CHEONANVITSSEN−AS−KR
DONGBUIT−AS−KRPUBNET1−AS
CMNET−GD
CNNIC−CN−COLNET
ASN−ATHOMEJP
GITS−TH−AS−AP
GNGAS
CAT−AP
COMINDICO−AP
SAERONET−AS−KR
TOKAI PACNET
GAYANET−AS−KREFTEL−AS−APHCNKUMHO−AS−KR
YAHOO−1VISIONARY
AIRSTREAMCOMM−NET
TELWEST−NETWORK−SVCS−STATIC
AFILIAS−NST
TRUVISTA
IPPLANET−AS
ITGATE
TVC−AS1
Diveo
ROCK−HILL−TELEPHONE
VEROXITY
BROADVIEWNET
INFOLINK−MIA−US
Hadara
RIPNET
CAVTEL02
HOMESCCITIZENS
CORPCOLO
NWT−AS−APK−OPTICOM
QALA−SG−AP
CITIC−HK−AP
ADC−BUDDYB−AS
GIGAPASS−AS−KR
ICE−AS−KRCOMCLARK−AS
SPTEL−AP AS−PNAPTOK
GIGAINFRA
CAMNET−AS
ASN−EQUINIX−APDREAMPLUS−AS−KR
GINAMHANVIT−AS−KR
HANVITIAB−AS−KR
ACE−1−WIFI−AS−AP
KBTNETSPEED−AS−AP
WWW−PH−AP
LINKTELECOM−NZ−AP
PLANET−OZI
POSNET
FPT−AS−AP
EGIHOSTING
USCARRIER
TWIN−LAKES
SERVERROOM
O1COMM
AMPATH−AS
CHL
BEN−LOMAND−TEL
C−R−T
HARCOM1
ISPNET−1
ALLIED−TELECOMBARR−XPLR−ASN
SKYBEST
VITSSEN−SUWON−AS−KR
INCHEON−AS−KR
ONLINE−AS
YAHOO
AINS−AS−AP
IPVG−AS−AP
SKYBB−AS−AP
BIGAIR−AP
PIPETRANSIT−AS−AP
KIRZ−AS−TH
BCL−TRANSIT−AS−AP
ABN−PEERING−AS−AP
CMNET−V4SHANGHAI−AS−AP
WICAM−AS
ISATNET−AS−IDDIGISATNET−AS−ID
YAHOO−JP−AS−AP
ENMAX−ENVISION
SANTEECOOPER
2901
CooperaciónUniversidade
FIORD−AS
IPNXng
NASSIST−AS
VOXEL−DOT−NETCOLOGUYS
GIGANEWS
XITTEL−AS EONIX−CORPORATION−AS−PHX01−WWW−INFINITIE−NET
EVERN2
ONS−COS
NEMONT−CELLULAR
CANBYTELEPHONEASSOCIATION
RSN−1
PARC−ASN
VPLSNET
CNNIC−PBSL−AP
HELLONET−AS−KR
EXTREMEBB−AS−MY
SIS−GROUP−SYD−AS−AP
AIS−TH−AP
PACTEL−AS−AP
SUNNYVISION−AS−AP
HCLC−AS−KR
CMBHK−AS−KR
CMBDAEJEON−AS−KR
BB−BROADBAND−TH−AS−AP
MURAMATSU−AS−AP
GOSCOMB−AS
NORTHLAND−CABLE
BUDREMYER−AS
CTC−CORE−AS
EPSILON
IPSERVERONE−AS−AP
JCN−AS−KR
DAELIM
DEDICATED−SERVERS−SYD−AS−AP
SUPERBROADBANDNETWORK−AS−AP
GRNT−JP−AS−AP
JASTEL−NETWORK−TH−AP
LBNI
TRIPLETNET−AS−AP
PFNL−ASN
EVERN−1
LIGHTOWER
IMPACT−AS−APCMNET−GUANGDONG−AP
CMNET−JIANGSU−AP
Visualizing all prefixes
−100 0 100
−100
0
100Smaller cluster is about 25% of all prefixes
Consistent clustering in any sample of prefixes
Understanding Clustering under RSD
In terms of the next-hop matrix N: • A cluster C corresponds to a
set of columns of N, ie, N(:,C) • The columns are close in RSD • So they must be similar in
some positions S
In terms of BGP routing: • For any row in N(S,C) the entries
must be nearly the same • So S is a set of ASes making
similar routing decisions w.r.t. C
We call such a pair (S,C) a local atom
−100 0 100
−100
0
100
Why are Local Atoms Interesting?
• Routing decisions by ASes are made independently • There will often be many next hops for any given prefix • Different ASes will choose different next hops for a prefix
Exposing Unexpected Coordination
−100 0 100
−100
0
100
−100 0 100
−100
0
100 A significant fraction of ASes are all following the same rule.
There is a special AS X in the system.
The rule is: “Always use AS X to get to any customer in its cone.”
The ASes making this coordinated decision are not related in any obvious way
The smaller cluster is a local atom.
Attractivity of AS6939
What is so special about Hurricane Electric?
• A peering policy based only on colocation – Settlement-free peering – No commercial consideration or traffic ratios
• Immensely attractive to small/medium NSPs – “free” access to any customer network of HE – HE backbone has global scope
“Hurricane Electric is willing to peer with networks which are connected to one or more exchange points which we have in common.”
he.net/adm/peering.html
Systematic Discovery of Local Atoms
• Manual inspection uncovers a large-scale local atom – The Hurricane Electric cluster – Includes about 25% of prefixes in the Internet
• Do smaller local atoms exist? How can we detect them? • Need an effective clustering strategy for RSD
• A natural approach: seek clusters such that – Within-cluster RSD is minimized – Between-cluster RSD is maximized
Pivot Clustering
• Minimizing P-Cost is NP-hard, but there is an expected 3-approximation algorithm for it: Pivot clustering
Largest 5 clusters found using Pivot Clustering
• Clusters show clear separation
• Each cluster corresponds to a local atom
33
Number of Prefixes (C)
Number of Source ASes (S)
Destinations
C1 150 16 Ukraine 83% Czech. Rep 10%
C2 170 9 Romania 33% Poland 33%
C3 126 7 India 93% US 2%
C4 484 8 Russia 73% Czech rep. 10%
C5 375 15 US 74% Australia 16%
Interpreting Clusters
Conclusions
• RSD is a metric that captures when two prefixes are “routed similarly in the Internet”
• It has many useful properties – a fine-grained distance metric even in small-world graphs – captures a different notion of distance than shortest-path – summarizes the routing state of the entire network – effective for visualizing Internet prefixes – allows in-depth analysis of BGP – uncovers surprising and interesting patterns in routing
• Local atoms
Thank you!
Gonca Gursun Natali Ruchansky Evimaria Terzi
Related Work • Reported that BGP tables provide an incomplete view of the
AS graph. [Roughan et. al. ‘11]
• Visualization based on AS degree and geo-location. [Huffaker and k. claffy ‘10]
• Small scale visualization through BGPlay and bgpviz
• Clustering on the inferred AS graph. [Gkantsidis et. al. ‘03]
• Clustering prefixes that share the same BGP paths into policy atoms. [Broido and k. claffy ‘01]
• Methods for calculating policy atoms and characteristics. [Afek et. al. ‘02]
36
Data Hygiene Implications
• BGP data is known to favor customer-provider links and miss peer-peer links
• Our restriction to 38 x 135000 known paths means that we are not missing any links in the scope of our experiments
• Hence accuracy for the chosen subsets of M is not affected by missing links
• However, the accuracy of our methods may be different on the full M – Whether better or worse, it’s not clear – There is some reason to believe it would be better…