Shivkumar KalyanaramanRensselaer Polytechnic Institute
1
Exterior Gateway Protocols: EGP, BGP-4, CIDR:
Brief Version
Shivkumar KalyanaramanRensselaer Polytechnic Institute
[email protected] http://www.ecse.rpi.edu/Homepages/shivkuma
Based in part upon slides of Tim Griffin (AT&T), Ion Stoica (UCB), J. Kurose (U Mass), Noel Chiappa(MIT)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
2
Cores, Peers, and the limit of default routesAutonomous systems & EGPBGP4CIDR: reducing router table sizesRefs: Chap 10,14,15. Books: “Routing in Internet” by Huitema, “Interconnections” by Perlman, “BGP4” by Stewart, Sam Halabi, Danny McPherson, Internet Routing Architectures Reading: Geoff Huston, Commentary on Inter-domain Routing in the Internet Reference: BGP-4 Standards Document: In TXT Reading: Norton, Internet Service Providers and PeeringReading: Labovitz et al, Delayed Internet Routing ConvergenceReference: Paxson, End-to-End Routing Behavior in the Internet, Reading: Interdomain Routing: Additional Notes: In PDF | In MS Word Reference Site: Griffin, Interdomain Routing Links
Overview
Shivkumar KalyanaramanRensselaer Polytechnic Institute
3
Intra-AS and Inter-AS routing
inter-AS, intra-ASrouting in
gateway A.c
network layerlink layer
physical layer
a
b
b
aaC
A
Bd
Gateways:•perform inter-ASrouting amongst themselves•perform intra-ASrouters with other routers in their AS
A.cA.a
C.bB.a
cb
c
Shivkumar KalyanaramanRensselaer Polytechnic Institute
4
History of Inter-Domain Routing & EGP
Shivkumar KalyanaramanRensselaer Polytechnic Institute
5
History: Default Routes: limitsDefault routes => partial informationRouters/hosts w/ default routes rely on other routers to complete the picture.In general routing “signposts” should be:
Consistent, I.e., if packet is sent off in one direction then another direction should not be more optimal.Complete, I.e., should be able to reach all destinations
Shivkumar KalyanaramanRensselaer Polytechnic Institute
6
CoreA small set of routers that have consistent & complete information about all destinations.Non-core routers can have partial information provided they point default routes to the core
Partial info allows site administrators to make local routing changes independently.
CORE
. . .S1 S2 Sm
Shivkumar KalyanaramanRensselaer Polytechnic Institute
7
Peer BackbonesInitially NSFNET: only 1 link to ARPANETAddition of multiple links => multiple possible routes => need for dynamic routing
Today there are over 30 backbones!Routing protocol at cores/peers: GGP -> EGP-> BGP-4
NSFNET
S1 S2 Sm. . . s1
Peering LinkARPANet
s2 sn. . .
Shivkumar KalyanaramanRensselaer Polytechnic Institute
8
Exterior Gateway Protocol (EGP)A mechanism that allows non-core routers to learn routes from core (external routes) routers so that they can choose optimal backbone routes
A mechanism for non-core routers to inform core routers about hidden networks (internal routes)
Autonomous System (AS) has the responsibility of advertising reachability info to other ASs.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
9
Purpose of EGP
R border router
internal router
EGPR2
R1
R3
A
AS1
AS2
you can reachnet A via me
traffic to A
table at R1:dest next hopA R2
Share connectivity information across ASes
Shivkumar KalyanaramanRensselaer Polytechnic Institute
10
EGP OperationNeighbor Acquisition: Reliable 2-way handshakeNeighbor Reachability:
Hellos: j out of m hellos OK => Neighbor UPk out of n hellos NOT OK => Neighbor DOWN
Updates/Queries:EGP is an incremental protocol. New info => send updatesEach router can query neighbors as wellReachability advertized; metrics ignoredRequires a tree topology of ASes to avoid loops (see next slide)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
11
Why EGP Requires a Tree Structure..
Shivkumar KalyanaramanRensselaer Polytechnic Institute
12
EGP weaknessesEGP does not interpret the distance metrics in routing update messages => cannot be compute shorter of two routes
As a result it restricts the topology to a tree structure, with the core as the root
Rapid growth => many networks may be temporarily unreachableOnly one path to destination => no load sharing
Need new protocol => BGP-4
Shivkumar KalyanaramanRensselaer Polytechnic Institute
13
The Current Stage for Inter-Domain Routing: ASes & Policy Routing
Scenarios
Shivkumar KalyanaramanRensselaer Polytechnic Institute
14
Today’s Big Picture
Large ISPLarge ISP
Dial-UpISP
AccessNetwork
Small ISP
Stub
Stub Stub
Large number of diverse networks
Shivkumar KalyanaramanRensselaer Polytechnic Institute
15
Autonomous Systems (ASes)An autonomous system is an autonomous routing
domain that has been assigned an Autonomous System Number (ASN).
All parts within an AS remain connected.
RFC 1930: Guidelines for creation, selection, and registration of an Autonomous System
… the administration of an AS appears to other ASes to have a single coherent interior routing plan and presents a consistent picture of what networks are reachable through it.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
16
Autonomous System(AS)
An autonomous system (AS) is a network under a single administrative control
An AS owns an IP prefixEvery AS has a unique AS number
ASes need to inter-network themselves to form a single virtual global network
Need a common protocol for communicationI.e. BGP-4
Shivkumar KalyanaramanRensselaer Polytechnic Institute
17
IP Address Allocation and Assignment: Internet Registries
IANAwww.iana.org
RFC 2050 - Internet Registry IP Allocation Guidelines RFC 1918 - Address Allocation for Private Internets RFC 1518 - An Architecture for IP Address Allocation with CIDR
ARINwww.arin.org
APNICwww.apnic.org
RIPEwww.ripe.org
Allocate to National and local registries and ISPsAddresses assigned to customers by ISPs
Shivkumar KalyanaramanRensselaer Polytechnic Institute
18
AS Numbers (ASNs)
ASNs are 16 bit values.64512 through 65535 are “private”
Currently over 11,000 in use.• Genuity: 1 • MIT: 3• Harvard: 11• UC San Diego: 7377• AT&T: 7018, 6341, 5074, … • UUNET: 701, 702, 284, 12199, …• Sprint: 1239, 1240, 6211, 6242, …• …
ASNs represent units of routing policy
Shivkumar KalyanaramanRensselaer Polytechnic Institute
19
Internet AS Map: caida.org
Shivkumar KalyanaramanRensselaer Polytechnic Institute
20
Which Routers do Inter-AS routing?
R border router internal router
BGPR2
R1
R3AS1
AS2
Two types of routersBorder router(Edge), Internal router(Core)
Two border routers of different ASes will have a BGP session
Shivkumar KalyanaramanRensselaer Polytechnic Institute
21
Requirements for Inter-AS RoutingShould scale for the size of the global Internet.
Focus on reachability, not optimalityUse address aggregation techniques to minimize core routing table sizes and associated control trafficAt the same time, it should allow flexibility in topological structure (eg: don’t restrict to trees etc)
Allow policy-based routing between autonomous systemsPolicy refers to arbitrary preference among a menu of available routes (based upon routes’ attributes)Fully distributed routing (as opposed to a signaled approach) is the only possibility.Extensible to meet the demands for newer policies.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
22
Policy Routing: Nontransit vs Transit ASes
ISP 1ISP 2
NET ATraffic NEVER flows from ISP 1through NET A to ISP 2
Internet Serviceproviders (ISPs)have transitnetworks
Nontransit ASmight be a corporateor campus network.Could be a “content provider”
Shivkumar KalyanaramanRensselaer Polytechnic Institute
23
Policy Routing: Selective Transit
NET BNET C
NET A provides transitbetween NET B and NET Cand between NET D and NET C
NET A
NET D
NET A DOES NOTprovide transitBetween NET D and NET B
Most transit ASes allow only selective transitkey impact of commercialization
Shivkumar KalyanaramanRensselaer Polytechnic Institute
24
Policy Routing: Customers & Providers
provider
customer
IP trafficprovider customer
Customer pays provider for access to the Internet
Shivkumar KalyanaramanRensselaer Polytechnic Institute
25
Policy Routing: Customer-Provider Hierarchy
IP trafficprovider customer
Shivkumar KalyanaramanRensselaer Polytechnic Institute
26
Policy Routing: The Peering Relationship
Peers provide transit between their respective customers
Peers do not provide transit between peers
Peers (often) do not exchange $$$
peer peer
customerprovider
trafficallowed
traffic NOTallowed
Shivkumar KalyanaramanRensselaer Polytechnic Institute
27
Peering WarsPeer Don’t Peer
Reduces upstream transit costsCan increase end-to-end performanceMay be the only way to connect your customers to some part of the Internet (“Tier 1”)
You would rather have customersPeers are usually your competitionPeering relationships may require periodic renegotiation
Peering struggles are by far the most contentious issues in the ISP world!Peering agreements are often confidential.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
28
BGP-4 Design
Shivkumar KalyanaramanRensselaer Polytechnic Institute
29
Recall: Distributed Routing Techniques
Link State Vectoring
Topology information is flooded within the routing domainBest end-to-end paths are computed locally at each router. Best end-to-end paths determine next-hops.Based on minimizing some notion of distanceWorks only if policy is sharedand uniformExamples: OSPF, IS-IS
Each router knows little about network topologyOnly best next-hops are chosen by each router for each destination network. Best end-to-end paths result from composition of all next-hop choicesDoes not require any notion of distanceDoes not require uniform policies at all routersExamples: RIP, BGP
Shivkumar KalyanaramanRensselaer Polytechnic Institute
30
BGP-4BGP = Border Gateway Protocol Is a Policy-Based routing protocol Is the de facto EGP of today’s global InternetRelatively simple protocol, but configuration is complex and the entire world can see, and be impacted by, your mistakes.
• 1989 : BGP-1 [RFC 1105]– Replacement for EGP (1984, RFC 904)
• 1990 : BGP-2 [RFC 1163]• 1991 : BGP-3 [RFC 1267]• 1995 : BGP-4 [RFC 1771]
– Support for Classless Interdomain Routing (CIDR)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
31
BGP Operations (Simplified)
Establish session onTCP port 179
Exchange allactive routes
Exchange incrementalupdates
AS1
AS2
BGP session
While connection is ALIVE exchangeroute UPDATE messages
Shivkumar KalyanaramanRensselaer Polytechnic Institute
32
Four Types of BGP Messages
Open : Establish a peering session. Keep Alive : Handshake at regular intervals. Notification : Shuts down a peering session. Update : Announcing new routes or withdrawingpreviously announced routes.
announcement =
prefix + attributes values
Shivkumar KalyanaramanRensselaer Polytechnic Institute
33
Border Gateway Protocol (BGP)Allows arbitrary AS topologiesUses a path-vector concept to help prevent routing loops in complex topologiesFor inter-domain routing: shortest path may not be preferred for policy, security, cost reasons.
Different routers have different preferences (policy) => as packet goes thru network it will encounter different policies=> Bellman-Ford or Dijkstra don’t work!
Soln: BGP allows attributes for AS and paths which could include policies (policy-basedrouting).
Shivkumar KalyanaramanRensselaer Polytechnic Institute
34
BGP (Cont’d)Consistency criterion: When a BGP Speaker A advertises a prefix to its B that it has a path to IP prefix C …
… B can be certain that A is actively using that AS-path to reach that destination
BGP uses TCP between 2 peers (reliability)Exchange entire BGP table first (50K+ routes!)Later exchanges only incremental updatesApplication (BGP)-level keepalive messages
Interior and exterior peers: need to exchangereachability information among interior peers before updating intra-AS forwarding table.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
35
Two Types of BGP Neighbor Relationships
• External Neighbor (eBGP) in a different Autonomous Systems
• Internal Neighbor (iBGP) in the same Autonomous System AS1
AS2
eBGP
iBGP
iBGP is routed (using IGP!)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
36
I-BGP and E-BGP
R border router
internal router
R1
AS1
R4R5
B
AS3
E-BGP
R2R3
AAS2 announce B
IGP: Interior Gateway Protocol.Examples: IS-IS, OSPF
I-BGP
IGP
Shivkumar KalyanaramanRensselaer Polytechnic Institute
37
I-BGP vs IGP Why is IGP (OSPF, ISIS) not used ?
In large ASs full route table is very large (100K routes!)Rate of change of routes is frequentTremendous amount of control trafficNot to mention Dijkstra computation being evoked for any change…BGP policy information may be lost
I-BGP :Within an ASSame protocol/state machines as EBGP But different rules about advertising prefixes
Shivkumar KalyanaramanRensselaer Polytechnic Institute
38
IBGP vs EBGPI-BGP nodes: typically ABRs, or other nodes where default routes terminateI-BGP peering sessions between every pair of routers within an AS: full mesh.
A
B
D
C
Physical link
IBGP session
AS1
Shivkumar KalyanaramanRensselaer Polytechnic Institute
39
IBGP Peers: Fully MeshedeBGP update
iBGP updates
IBGP is needed to avoid routing loops within an ASFull Mesh =>
Independent of physical connectivity.Single link may see same update multiple times!
IBGP neighbors do not announce routes received via iBGP to other iBGPneighbors.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
40
IBGP Scaling: Route ReflectionAdd hierarchy to I-BGP Route reflector: A router whose BGP implementation supports the re-advertisement of routes between I-BGP neighborsRoute reflector client: A router which depends on route reflector to re-advertise its routes to entire AS and learn routes from the route reflector
Shivkumar KalyanaramanRensselaer Polytechnic Institute
41
Route Reflection
RR-C1
RR-C2
RR1
RR2
RR3
RR-C3
RR-C4
AS1
AS2
ER10.0.0.0/24
128.23.0.0/16
EBGP
IBGP
Shivkumar KalyanaramanRensselaer Polytechnic Institute
42
AS ConfederationsDivide and conquer: Divides a large AS into sub-ASs
AS-112
10
14
11
13R1
R2
Sub-AS
Shivkumar KalyanaramanRensselaer Polytechnic Institute
43
BGP-4: Support for Scaling and Address Management
Shivkumar KalyanaramanRensselaer Polytechnic Institute
44
CIDRShortage of class Bs => give out a set of class Cs instead of one class B address
Problem: every class C n/w needs a routing entry !
Solution: Classless Inter-domain Routing (CIDR). Also called “supernetting”Key: allocate addresses such that they can be summarized, I.e., contiguously.
Share same higher order bits (I.e. prefix)Routing tables and protocols must be capable of carrying a subnet mask. Notation: 128.13.0/23
When an IP address matches multiple entries (eg194.0.22.1), choose the one which had the longest mask (“longest-prefix match”)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
45
Inter-domain Routing Without CIDR
ServiceProvider
Global InternetRouting
Mesh
204.71.0.0204.71.1.0204.71.2.0…...…….
204.71.0.0204.71.1.0204.71.2.0
204.71.255.0
…...…….204.71.255.0
Inter-domain Routing With CIDR
ServiceProvider
204.71.0.0204.71.1.0204.71.2.0…...…….
Global InternetRouting
Mesh
204.71.0.0/16
204.71.255.0
Shivkumar KalyanaramanRensselaer Polytechnic Institute
46
RFC 1519: Classless Inter-Domain Routing (CIDR)
Pre-CIDR: Network ID ended on 8-, 16, 24- bit boundaryCIDR: Network ID can end at any bit boundaryIP Address : 12.4.0.0 IP Mask: 255.254.0.0
00001100 00000100 00000000 00000000
11111111 11111110 00000000 00000000
Network Prefix
Address
Mask
for hosts
Usually written as 12.4.0.0/15, a.k.a “supernetting”
Shivkumar KalyanaramanRensselaer Polytechnic Institute
47
Longest Prefix Match (Classless) Forwarding
Destination =12.5.9.16-------------------------------
payload
Prefix Interface Next Hop
12.0.0.0/8 10.14.22.19 ATM 5/0/8
12.4.0.0/15
12.5.8.0/23 attached
Ethernet 0/1/3
Serial 1/0/7
10.1.3.77
IP Forwarding Table
0.0.0.0/0 10.14.11.33 ATM 5/0/9 OK
better
even better
best!
Shivkumar KalyanaramanRensselaer Polytechnic Institute
48
ISP3
AS1128.40/16140.127/16
Link A
Link B
ISP1128.32/11
ISP2140.64/10
128.40/16
140.127/16
ISP2ISP1
ORIGIN AS
Next Hop
Prefix
ISP2140.64/10ISP1128.32/11
Table at ISP3CIDR at Work, No load balancing
Shivkumar KalyanaramanRensselaer Polytechnic Institute
49
CIDR Subverted for Load Balancing
ISP3
AS1128.40/16140.127/16
Link A
Link B
ISP1128.32/11
ISP2140.64/10
140.255.20/24, 128.40/16
128.42.10/24, 140.127/16
AS1AS1ISP2ISP1
ORIGIN AS
Next Hop
Prefix
ISP2128.42.10/24ISP1140.255.20/24ISP2140.64/10ISP1128.32/11
Table at ISP3
Shivkumar KalyanaramanRensselaer Polytechnic Institute
50
Deaggregation + Multihoming
AS 1
customerAS 2
provider
AS 3provider
12.2.0.0/16
If AS 1 doesnot announce themore specific prefix,then most traffic to AS 2 will go through AS 3 because it is a longer match
12.2.0.0/1612.2.0.0/1612.0.0.0/8
AS 2 is “punching a hole” in the CIDR block of AS 1=> subverts CIDR
Shivkumar KalyanaramanRensselaer Polytechnic Institute
51
Policy Routing in BGP-4
Shivkumar KalyanaramanRensselaer Polytechnic Institute
52
What is Routing PolicyPolicy refers to arbitrary preference among a menu of available routes
Public description of the relationship between external BGP peersCan also describe internal BGP peer relationship
BGP Hook: policy routing choice based upon routes’ attributes
Shivkumar KalyanaramanRensselaer Polytechnic Institute
53
How to do policy routing?192.0.2.0/24pick me!
192.0.2.0/24pick me!
192.0.2.0/24pick me!
192.0.2.0/24pick me!
Given multipleroutes to the sameprefix, a BGP speakermust pick at mostone best route based upon routes’ attributes
Shivkumar KalyanaramanRensselaer Polytechnic Institute
54
BGP Policy Knob: Attributes
Value Code Reference----- --------------------------------- ---------
1 ORIGIN [RFC1771]2 AS_PATH [RFC1771]3 NEXT_HOP [RFC1771]4 MULTI_EXIT_DISC [RFC1771]5 LOCAL_PREF [RFC1771]6 ATOMIC_AGGREGATE [RFC1771]7 AGGREGATOR [RFC1771]8 COMMUNITY [RFC1997]9 ORIGINATOR_ID [RFC2796]
10 CLUSTER_LIST [RFC2796]11 DPA [Chen]12 ADVERTISER [RFC1863]13 RCID_PATH / CLUSTER_ID [RFC1863]14 MP_REACH_NLRI [RFC2283] 15 MP_UNREACH_NLRI [RFC2283] 16 EXTENDED COMMUNITIES [Rosen]
...255 reserved for development
We will cover a subset of these attributes
Not all attributesneed to be present inevery announcement
From IANA: http://www.iana.org/assignments/bgp-parameters
Shivkumar KalyanaramanRensselaer Polytechnic Institute
55
Import and Export PoliciesFor inbound traffic
Filter outbound routesTweak attributes on outbound routes in the hope of influencing your neighbor’s best route selection
For outbound trafficFilter inbound routesTweak attributes on inbound routes to influence best route selection
outboundroutes
inboundroutes
inboundtraffic
outboundtraffic
In general, an AS has morecontrol over outbound traffic
Shivkumar KalyanaramanRensselaer Polytechnic Institute
56
BGP Route Processing
Best RouteSelection
Apply ImportPolicies
Best Route Table
Apply ExportPolicies
Install forwardingEntries for bestRoutes.
Apply Policy =filter routes & tweak
Based onAttributeValues
IP Forwarding Table
Apply Policy =filter routes & tweak attributes
BestRoutes
TransmitBGP Updates
ReceiveBGPUpdates attributes
Shivkumar KalyanaramanRensselaer Polytechnic Institute
57
Policy Implementation Flow
MainBGPRIB
AdjRIBOut
Outgo-ing
AdjRIBIn
Incom-ing
MainRIB/FIB
IGPs
Static&
HWInfo
Shivkumar KalyanaramanRensselaer Polytechnic Institute
58
Conceptual Model of BGP OperationRIB : Routing Information BaseAdj-RIB-In: Prefixes learned from neighbors. As many Adj-RIB-In as there are peersLoc-RIB: Prefixes selected for local use after analyzing Adj-RIB-Ins. This RIB is advertised internally.Adj-RIB-Out : Stores prefixes advertised to a particular neighbor. As many Adj-RIB-Out as there are neighbors
Shivkumar KalyanaramanRensselaer Polytechnic Institute
59
BGP-4 Messages and Route Attributes
Shivkumar KalyanaramanRensselaer Polytechnic Institute
60
UPDATE message in BGPPrimary message between two BGP speakers.Used to advertise/withdraw IP prefixes (NLRI)Path attributes field : unique to BGP
Apply to all prefixes specified in NLRI fieldOptional vs Well-known; Transitive vs Non-transitive
2 octets
Withdrawn Routes Length
Withdrawn Routes (variable length)
Total Path Attributes Length
Path Attributes (variable length)
Network Layer Reachability Info. (NLRI: variable length)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
61
Path Attributes: ORIGINORIGIN:
Describes how a prefix came to BGP at the origin ASPrefixes are learned from a source and “injected” into BGP:Directly connected interfaces, manually configured static routes, dynamic IGP or EGP
Values: IGP (EGP): Prefix learnt from IGP (EGP)INCOMPLETE: Static routes
Shivkumar KalyanaramanRensselaer Polytechnic Institute
62
Path Attributes: AS-PATHList of ASs thru which the prefix announcement has passed. AS on path adds ASN to AS-PATHEg: 138.39.0.0/16 originates at AS1 and is advertised to AS3 via AS2.Eg: AS-SEQUENCE: “100 200”Used for loop detection and path selection
AS1(100)
AS2(200)
AS3(15)
138.39.0.0/16
Shivkumar KalyanaramanRensselaer Polytechnic Institute
63
Traffic Often Follows ASPATH
AS 4
135.207.0.0/16ASPATH = 3 2 1
AS 3AS 1135.207.0.0/16
AS 2
IP Packet Dest =135.207.44.66
Shivkumar KalyanaramanRensselaer Polytechnic Institute
64
… But It Might Not
AS 4AS 3AS 2AS 1135.207.0.0/16
135.207.0.0/16ASPATH = 3 2 1
IP Packet Dest =135.207.44.66
AS 5
135.207.44.0/25ASPATH = 5
135.207.44.0/25
AS 2 filters allsubnets with maskslonger than /24
From AS 4, it may look like this packet will take path 3 2 1, but it actually takes path 3 2 5
135.207.0.0/16ASPATH = 1
Shivkumar KalyanaramanRensselaer Polytechnic Institute
65
Shorter AS-PATH Doesn’t Mean Shorter # Hops
AS 4
AS 3
AS 2
AS 1
BGP says that path 4 1 is betterthan path 3 2 1
Duh!
Shivkumar KalyanaramanRensselaer Polytechnic Institute
66
ASPATH Padding: Shed inbound traffic
Padding will (usually) force inbound traffic from AS 1to take primary link
AS 1
192.0.2.0/24ASPATH = 2 2 2
customerAS 2
provider
192.0.2.0/24
backupprimary
192.0.2.0/24ASPATH = 2
Shivkumar KalyanaramanRensselaer Polytechnic Institute
67
Load-Balancing Knobs in BGP
Local Preference
AS1 AS2
MED
LOCAL-PREF: outbound traffic, local preference (box-level knob)MED: Inbound-traffic, typically from the same ISP (link-level knob)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
68
Path Attribute: LOCAL-PREFLocally configured indication about which path is preferred to exit the AS in order to reach a certain network. Default value = 100. Higher is better.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
69
Hot Potato Routing: Closest Egress Point
192.44.78.0/24
15 56 IGP distances
egress 1 egress 2
This Router has two BGP routes to 192.44.78.0/24.
Hot potato: get traffic off of your network as Soon as possible. Go for egress 1!
Shivkumar KalyanaramanRensselaer Polytechnic Institute
70
Getting Burned by the Hot Potato
15 56
172865High bandwidth
Provider backbone
Low b/wcustomer backbone
HeavyContent
Web Farm
SFF NYC
San Diego
Many customers want their provider to carry the bits!
tiny http requesthuge http reply
Shivkumar KalyanaramanRensselaer Polytechnic Institute
71
Attributes: MULTI-EXIT Discriminator
Also called METRIC or MED Attribute. Lower is betterAS1:multihomed customer. AS2 (provider) includes MED to AS1AS1 chooses which link (NEXTHOP) to useEg: traffic to AS3 can go thru Link1, and AS2 thru Link2
AS1 AS2
AS3
AS4
Link A
Link B
Shivkumar KalyanaramanRensselaer Polytechnic Institute
72
MEDs Can Export Internal Instability
15
172865 Heavy
Content Web Farm
192.44.78.0/24
192.44.78.0/24MED = 15
192.44.78.0/24MED = 56 OR 10
5610
FLAP
FLAP
FLAP
FLAP
FLAPFLAP
Shivkumar KalyanaramanRensselaer Polytechnic Institute
73
How Can Routes be Colored?BGP Communities
A community value is 32 bits
By convention, first 16 bits is ASN indicating who is giving itan interpretation
communitynumber
• Used within and betweenASes
• The set of ASes must agree on how to interpret the community value• Very powerful BECAUSE it has no (predefined) meaning
Community Attribute = a list of community values.(So one route can belong to multiple communities)
Two reserved communities
no_advertise 0xFFFFFF02: don’t pass to BGP neighborsno_export = 0xFFFFFF01: don’t export out of AS
RFC 1997 (August 1996)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
74
Communities Example1:100
Customer routes1:200
Peer routes1:300
Provider Routes
To Customers1:100, 1:200, 1:300
To Peers1:100
To Providers1:100
Import Export
AS 1
Shivkumar KalyanaramanRensselaer Polytechnic Institute
75
BGP Route Selection ProcessSeries of tie-breaker decisions...If NEXTHOP is inaccessible do not consider the route. Prefer largest LOCAL-PREFIf same LOCAL-PREF prefer the shortest AS-PATH. If all paths are external prefer the lowest ORIGIN code(IGP<EGP<INCOMPLETE). If ORIGIN codes are the same prefer the lowest MED.If MED is same, prefer min-cost NEXT-HOPIf routes learned from EBGP or IBGP, prefer paths learnt from EBGPFinal tie-break: Prefer the route with I-BGP ID (IP address)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
76
Route Selection Summary
Highest Local Preference Enforce relationships
Shortest ASPATH
Lowest MED
i-BGP < e-BGP
Lowest IGP cost to BGP egress
traffic engineering
Lowest router ID Throw up hands andbreak ties
Shivkumar KalyanaramanRensselaer Polytechnic Institute
77
Caveat
• BGP is not guaranteed to converge on a stable routing. Policy interactions could lead to “livelock” protocol oscillations.
See “Persistent Route Oscillations in Inter-domain Routing” by K. Varadhan, R.
Govindan, and D. Estrin. ISI report, 1996
• Corollary: BGP is not guaranteed to recover from network failures.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
78
BGP Table Growth
Thanks: Geoff Huston. http://www.telstra.net/ops/bgptable.html
Shivkumar KalyanaramanRensselaer Polytechnic Institute
79
ASNs Growth
From: Geoff Huston. http://www.telstra.net/ops
Shivkumar KalyanaramanRensselaer Polytechnic Institute
80
BGP Updates: Mostly Stable…
Typically, 80% ofthe updates are for less than 5% Of the prefixes.
Most prefixes are stable most of the time. On this day, about 83% of the prefixes were not updated.
Percent of BGP table prefixes
Thanks to Madanlal Musuvathi for this plot. Data source: RIPE NCC
Shivkumar KalyanaramanRensselaer Polytechnic Institute
81
Route Flap Dampening
route dampenedfor nearly 1 hour
penalty for each flap = 1000
Shivkumar KalyanaramanRensselaer Polytechnic Institute
82
BGP Convergence: How Long Does BGP Take to Adapt to Changes?
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100 120 140 160
Seconds Until Convergence
Cum
ulat
ive
Perc
enta
ge o
f Eve
nts
Tup
Tshort
Tlong
Tdow n
From: Abha Ahuja and Craig Labovitz
Shivkumar KalyanaramanRensselaer Polytechnic Institute
83
Summary
BGP is a fairly simple protocol …… but it is not easy to configureBGP is running on more than 100K routers making it one of world’s largest and most visible distributed systemsGlobal dynamics and scaling principles are still not well understoodTraffic Engineering hacked in as an afterthought…