bgp safety with spurious updates the conditions of bgp convergence
DESCRIPTION
BGP Safety with Spurious Updates The Conditions of BGP Convergence. Martin Suchara in collaboration with: Alex Fabrikant and Jennifer Rexford. Local Control vs. Global Properties. The Internet is a “network of networks” ~35,000 independently administered ASes - PowerPoint PPT PresentationTRANSCRIPT
BGP Safety with Spurious UpdatesThe Conditions of BGP Convergence
Martin Suchara
in collaboration with:Alex Fabrikant and
Jennifer Rexford
January 12, 2011
2
Local Control vs. Global Properties The Internet is a “network of networks”
~35,000 independently administered ASes Competitive cooperation to find routes
Local ControlInterdomain policies, Intradomain routing
Global PropertiesStability, reliability, performance, etc.
3
Interdomain Routing Autonomous systems (AS) have different goals
Different views on which path is best
Interdomain routing: agree on a set of paths
11 22
33
66
44
client
Prefer 5346 to 5326
Has to use : 5 → 3 → 2 → 6
Prefer 326 to 346
55
4
The Border Gateway Protocol (BGP) ASes exchange information about paths
Policy configurations provided by AS operators
Path selection: which path do I choose?
Path export: which neighbors do I tell?
d
2211
Data traffic
“I can reach d via AS 3”
“I can reach d”
33
5
Business Driven Policies of ASes
Peer-Peer Relationship
Export only customer routers to a peer
Export peer routes only to customers
Customer-Provider Relationship
Provider exports its customer’s routes to everybody
Customer exports provider’s routes only to downstream customers
6
BGP Safety Challenges 35,000 ASes and 300,000 address blocks
Flexible AS policies
Routing convergence usually takes minutes
But the system does not always converge…
0
1 2
d
Prefer 120 to 10
Prefer 210 to 20
Use 20Use 10Use 120
Use 210
7
Results on BGP Safety
Necessary or sufficient conditions of safety (Gao and Rexford, 2001), (Gao, Griffin and Rexford, 2001), (Griffin, Jaggard and Ramachandran, 2003), (Feamster, Johari and Balakrishnan, 2005), (Sobrinho, 2005), (Fabrikant and Papadimitriou, 2008), (Cittadini, Battista, Rimondini and Vissicchio, 2009), …
Safety verification important to network operators
Absence of a “dispute wheel” sufficient for safety (Griffin, Shepherd, Wilfong, 2002)
8
Models of BGP Existing models (variants of SPVP)
Widely used to analyze BGP properties Simple but do not capture spurious
behavior of BGP
This work A new model of BGP with spurious updates Spurious updates have major consequences More accurate model makes proofs easier!
9
Overview
I. Classical model of BGP: the SPVP
III. The surprise: networks believed to be safe oscillate!
IV. The consequences: applicability of earlier results
V. Convergence conditions: polynomial time verifiable
VI. Conclusion
II. Spurious BGP updates: what are they?
10
SPVP– Traditional Model of BGP (Griffin and Wilfong, 2000)
12010ε
Permitted paths
Network topology
2
0
1
The higher the more preferred
21020ε
The destination
Always includes the empty path
11
SPVP– Traditional Model of BGP (Griffin and Wilfong, 2000)
12010ε
0
1
21020ε
Selected paths
2
12
SPVP– Traditional Model of BGP (Griffin and Wilfong, 2000)
12010ε
0
21020ε
Activation21
Activation models the processing of BGP update messages sent by neighbors
Vertex or edge activations
13
SPVP– Traditional Model of BGP (Griffin and Wilfong, 2000)
12010ε
0
21020ε
Vertex or edge activations
Switch to best available
2 Activation1
Activation models the processing of BGP update messages sent by neighbors
14
SPVP– Traditional Model of BGP (Griffin and Wilfong, 2000)
12010ε
0
21020ε
System is safe if all “fair” activation sequences lead to a stable path assignment
Switch to best available
2 Activation1
15
Overview
I. Classical model of BGP: the SPVP
III. The surprise: networks believed to be safe oscillate!
IV. The consequences: applicability of earlier results
V. Convergence conditions: polynomial time verifiable
VI. Conclusion
II. Spurious BGP updates: what are they?
16
What are Spurious Updates? A phenomenon: router announces a route
other than the highest ranked one
Spurious BGP update 230:
Selected path: 20
Behavior not allowed in SPVP
0
1 2
3
123010
30
21020230
230
17
What Causes Spurious Updates?
1. Limited visibility to improve scalability Internal structure of ASes Cluster-based router architectures
2. Timers and delays to prevent instabilities and reduce overhead Route flap damping MRAI timers Grouping updates to priority classes Finite size message queues in routers
18
Cause 1 – Limited Visibility
r1
r2
r3BA
r3
The internal structure of ASes improves scalability while reducing visibility
After route r1 is withdrawn, router B temporarily announces r3
r3r1
ri
RR
A
Route reflector
Router
Route announcement
Autonomous system (AS)
r1r2
RR
1 2 3 4
r1
r2
r3
5
19
Cause 1 – Limited Visibility
r1
r2
r3BA
r3
The internal structure of ASes improves scalability while reducing visibility
After withdrawals of routes r1 and r3, router B temporarily withdraws the route
ri
A
Route reflector
Router
Route announcement
Autonomous system (AS)
r1r2
ε
RRRR
1 2 3 4
5
20
Cause 2 – Delays Route flap damping temporarily suppresses all
routes learned from a neighbor
After the update r2→r1 the less preferred route r3
is temporarily selected
r1
r2
r3
r2r3
A
r3r2
Autonomous system (AS)
→r1
ri
A Router
Route announcement
1 2 3
r1
r2
r3
4
21
DPVP– A More General Model of BGP DPVP = Dynamic Path Vector Protocol
Generalizes the earlier model (SPVP) Spurious update with a less preferred route
that was recently available
Spurious updates allowed in transient period τ after last route change
Safety is independent of numerical value τ
22
DPVP– A More General Model of BGP
12010ε
The permitted paths and their ranking
2
0
1
20
21020ε
Spurious update
Selected path: 210
Spurious updates are allowed only if current time < StableTime
Spurious updates may include paths that were recently available or the empty path
Remember all recently available paths (e.g. 20, 210)
StableTime = τ after last path change
23
DPVP– A More General Model of BGP Behavior captured irrespective of cause
Simple future-proof model independent of underlying network technologies
For every allowed spurious behavior in DPVP we can find a possible cause Details in our technical report:
TR-881-10, Dept. of Comp. Sci., Princeton, July 2010
www.cs.princeton.edu/~msuchara/publications.html
24
Overview
I. Classical model of BGP: the SPVP
III. The surprise: networks believed to be safe oscillate!
IV. The consequences: applicability of earlier results
V. Convergence conditions: polynomial time verifiable
VI. Conclusion
II. Spurious BGP updates: what are they?
25
Consequences of Spurious Updates Spurious behavior is temporary
Tempting to conclude that it cannot have long term consequences
The surprise: spurious behavior may trigger permanent oscillations!
26
The Surprise: Spurious Announcements Trigger Permanent Oscillations!
Permanent oscillations with spurious behavior
Safe instance in all classical models of routing:
13010
21020
321032031030
1
3
0
2
Stable outcome
27
Example of Oscillation
13010
21020
321032031030
ε
2
1
3
0
B
A
RR
Autonomous system (AS)
1
RR
A
1
Route reflector
Router
AS activation
Route announcement
28
Example of Oscillation
13010
21020
321032031030
2
1
3
0
B
A
Autonomous system (AS)Route reflector
Router
AS activation
Route announcementε
1
A
1
RR
RR
29
Example of Oscillation
13010
21020
321032031030
2
1
3
0
B
A
ε210
210
Autonomous system (AS)Route reflector
Router
AS activation
Route announcement
ε
ε
1
A
1
RR
RR
Did not learn about withdrawal of 210 yet
Does not know any route
30
Example of Oscillation
13010
21020
321032031030
2
1
3
0
B
A
ε210
210
Autonomous system (AS)Route reflector
Router
AS activation
Route announcement
ε
ε
1
A
1
RR
RR
Did not learn about withdrawal of 210 yet
Does not know any route
31
Example of Oscillation
13010
21020
321032031030
2
1
3
0
B
A
ε210
210
Autonomous system (AS)Route reflector
Router
AS activation
Route announcement
ε
ε
1
A
1
RR
RR
Did not learn about withdrawal of 210 yet
Does not know any route
32
Example of Oscillation
13010
21020
321032031030
2
1
3
0
B
A
2020
20
Autonomous system (AS)Route reflector
Router
AS activation
Route announcementε
1
A
1
RR
RR
33
Example of Oscillation
13010
21020
321032031030
2
1
3
0
B
A
Autonomous system (AS)Route reflector
Router
AS activation
Route announcementε
1
A
1
RR
RR
34
Example of Oscillation
13010
21020
321032031030
2
1
3
0
B
A
Autonomous system (AS)Route reflector
Router
AS activation
Route announcementε
1
A
1
RR
RR
35
Example of Oscillation
13010
21020
321032031030
2
1
3
0
B
A
Autonomous system (AS)Route reflector
Router
AS activation
Route announcementε
1
A
1
RR
RR
36
Example of Oscillation
13010
21020
321032031030
2
1
3
0
B
A
Autonomous system (AS)Route reflector
Router
AS activation
Route announcementε
1
A
1
RR
RR
37
Consequences of Spurious Updates Temporary behavior may cause permanent
oscillations
The number of oscillating nodes and / or frequency of oscillations may increase
Which results do not hold in the new model?
38
Overview
I. Classical model of BGP: the SPVP
III. The surprise: networks believed to be safe oscillate!
IV. The consequences: applicability of earlier results
V. Convergence conditions: polynomial time verifiable
VI. Conclusion
II. Spurious BGP updates: what are they?
39
Which SPVP Results Hold in DPVP? Most previous results in SPVP also hold for
DPVP Formal justification later in the talk
Some results cannot be extended Slightly different conditions of convergence Exponentially slower convergence possible
40
Case Study IDifferent Conditions of Convergence Safety under filtering: is instance safe under any
filtering?Filter arbitrary subset of routes
3
Absence of a “dispute reel” necessary and sufficient for safety under filtering in SPVP (Cittadini et al., 2009)
Our result: permanent oscillations in DPVP even without a reel
Subgraph with specific properties
41
Case Study I Different Conditions of Convergence Example of a “safe” topology, Cittadini et al.:
Spurious updates cause oscillations
132010130
321030320
213020210
210
320
130
2
3
0
1
42
Case Study IIExponential Slowdown of Convergence BGP converges in 2l + 2 phases
(Sami et al., 2009)
l: length of longest customer-provider chain Phase: each node processes and sends
updates Assumes standard business relationships
With spurious updates exponential slowdown to (2k + 1)l-2 phases
k: max. # of spurious updates after route change
43
Overview
I. Classical model of BGP: the SPVP
III. The surprise: networks believed to be safe oscillate!
IV. The consequences: applicability of earlier results
V. Convergence conditions: polynomial time verifiable
VI. Conclusion
II. Spurious BGP updates: what are they?
44
Convergence Conditions
Absence of a “dispute wheel” is still sufficient for safety in DPVP Most of the previous results of the past
decade still hold under DPVP!
Absence of a “dispute wheel” sufficient for safety in SPVP (Griffin, Shepherd, Wilfong, 2002)
One of the most cited results
Other stronger results in DPVP next
45
Why are Proofs Easier in DPVP? No need to prove that:
Announced route is the highest ranked one Announced route is the last one learned from
the downstream neighbor
Next the necessary and sufficient conditions
We changed the problem PSPACE complete vs. NP complete
46
Necessary and Sufficient Conditions How can we prove a system may oscillate?
Classify each node as “stable” or “coy” At least one “coy” node exists Prove that “stable” nodes must be stable Prove that “coy” nodes may oscillate
Next: a formal definition of a construction that captures this intuition
Easy in a model with spurious announcements
47
Necessary and Sufficient Conditions
Coy nodes may make spurious announcements
Stable nodes have a permanent path
Theorem: DPVP oscillates if and only if it has a CoyOTE
Definition: CoyOTE is a triple (C, S, Π) satisfying several conditions
One path assigned to each node proves if the node is coy or stable
0
1 2
3
123010
30
21020230
48
Necessary and Sufficient Conditions
Definition: CoyOTE satisfies these conditions:
0
1 2
3
123010
30
21020230
= coy
= stable
1) The best stable path assigned to each stable node
2) Coy node is assigned a coy path:
• more preferred than the best stable path
• consistent with the paths of stable nodes
3) Origin is stable
Verifying the Convergence Conditions = Finding a CoyOTE In general an NP-hard problem
Compact inputs with regular expressions
Can be checked in polynomial time for most “reasonable” network configurations!
49
e.g.
50
DeCoy – Safety Verification Algorithm Goal: verify safety in polynomial time
Main idea: find the maximal stable set S by expanding it in a greedy fashion
If all nodes are stable system is safe
Otherwise system may oscillate
51
DeCoy – Safety Verification Algorithm
0
1 2
3
123010
30
21020230
= coy= stable
Goal: verify safety in polynomial time
Initially the destination is the only stable node; it has a path to itself
52
DeCoy – Safety Verification Algorithm
0
1 2
3
123010
30
21020230
Goal: verify safety in polynomial time
Find a coy node where its most preferred path consistent with the paths of all stable nodes is a stable path
= coy= stable
53
DeCoy – Safety Verification Algorithm
0
1 2
3
123010
30
21020230
Goal: verify safety in polynomial time
Assign this path to the node and mark the node as stable
Keep expanding the stable region until we get stuck
= coy= stable
Find a coy node where its most preferred path consistent with the paths of all stable nodes is a stable path
54
DeCoy – Safety Verification Algorithm Goal: verify safety in polynomial time
Theorem: if all nodes are added to the stable set the system is safe. Otherwise it is not safe.
Open question: find a distributed version of the algorithm that preserves privacy of the nodes
Business relationships are secret
55
Overview
I. Classical model of BGP: the SPVP
III. The surprise: networks believed to be safe oscillate!
IV. The consequences: applicability of earlier results
V. Convergence conditions: polynomial time verifiable
VI. Conclusion
II. Spurious BGP updates: what are they?
56
Conclusion DPVP: best of both worlds
More accurate model of BGP Model simplifies theoretical analysis
Key results
57
Thank You!
58