internet routing (cos 598a) today: hot-potato routing jennifer rexford jrex/teaching/spring2005...

Internet Routing (COS Internet Routing (COS 598A)598A)

Today: Hot-Potato RoutingToday: Hot-Potato Routing

Jennifer RexfordJennifer Rexford

http://www.cs.princeton.edu/~jrex/teaching/http://www.cs.princeton.edu/~jrex/teaching/spring2005spring2005

Tuesdays/Thursdays 11:00am-12:20pmTuesdays/Thursdays 11:00am-12:20pm

Outline

• Hot-potato routing– Selecting closest egress from a set– Hot-potato routing changes

• Measuring hot-potato routing– BGP and IGP monitoring– Inferring causality

• Characterizing hot potatoes– Frequency and number of destinations– Convergence delays and forwarding loops

• Avoiding hot potatoes– Operational practices– New egress-selection techniques

Multiple Links Between Domains

1

2

3

4

5

67

ClientWeb server

Multiple links Middle of path

Hot-Potato Routing

San Francisco

Dallas

New York

Hot-potato routing = route to closest egress point when there is more than one route to destination

ISP network

9 10

destmultiple egress points

-All traffic from customer to peers-All traffic to customer prefixes with multiple connections

BGP Decision Process

• Highest local preference

•Lowest AS path length• Lowest origin type

• Lowest MED (with same next hop AS)

•Lowest IGP cost to next hop

• Lowest router ID of BGP speaker

“Equally good”

Motivations for Hot-Potato Routing

• Simple computation for the routers– IGP path costs are already computed– Easy to make a direct comparison

• Ensures consistent forwarding paths– Next router in path picks same egress point

• Reduces resource consumption– Get traffic out as early as possible– (But, what does IGP distance really mean???)

1

2

3

dest

Hot-Potato Routing Change

San Francisco

Dallas

New York

ISP network

dest

9 10- failure- planned maintenance- traffic engineering

11

Routes to thousands of destinations switch

egress points!!!Consequences:Transient forwarding instabilityTraffic shiftInterdomain routing changes

11

Why Care about Hot Potatoes?

• Understanding of Internet routing– Frequency of hot-potato routing changes– Influence on end-to-end performance

• Operational practices– Knowing when hot-potato changes happen– Avoiding unnecessary hot-potato changes– Analyzing externally-caused BGP updates

• Distributed root cause analysis– Each AS can tell what BGP updates it caused– Someone should know why each change

happens

Measuring Hot Potatoes

Measuring Hot Potatoes is Hard

• Cannot collect data from all routers– OSPF: flooding gives complete view of topology– BGP: multi-hop sessions to several vantage points

• A single event may cause multiple messages– Group related routing messages in time

• Router implementation affects message timing– Analyze timing in the measurement data– Controlled experiments with router in lab

• Many BGP updates caused by external events– Classify BGP routing changes by possible causes

Measurement Infrastructure

• Measure both protocols– BGP and OSPF monitors

• Correlate the two streams– Match BGP updates with OSPF events

• Analyze the interaction

X

Y

Z

M

ISP backboneOSPF messagesBGP updates

Algorithm for Matching

Classify BGP updates by possible OSPF causes

Transform stream of OSPFmessages into routing changes

link failurerefresh weight change

chg cost

del

chg cost

Match BGP updateswith OSPF events thathappen close in time

Stream of OSPF messages

Stream of BGP updates

time

Computing Cost Vectors

• Transform OSPF messages into path cost changes from a router’s perspective

MX

Y

Z

1

1

12 1

22 10LSA weight change, 10

10LSA weight change, 10

X 5Y 4

CHG Y, 7X 5Y 7

LSA delete

DEL X

Y 7

ADD X, 5

X 5 Y 7

OSPF routing changes:

2

1

Classifying BGP Updates

• Cannot have been caused by cost change– Destination just became (un)available in

BGP– New BGP route through same egress point– New route better/worse than old (e.g.,

shorter)

• Can have been caused by cost change– New route is equally good as old route

(perhaps X got closer, or Y got further away)X

Y

Z

dst

M

The Role of Time

• OSPF link-state advertisements – Multiple LSAs from a single physical event– Group into single cost vector change

• BGP update messages– Multiple BGP updates during convergence– Group into single BGP routing change

• Matching IGP to BGP– Avoid matching unrelated IGP and BGP

changes– Match related changes that are close in timeCharacterize the measurement data to determine the right windows

10 sec

70 sec

180 sec

Characterizing Hot Potatoes

Frequency of Hot-Potato Changes

router Arouter B

Variation Across Routers

NY

109

SF

A

NY

10001SF

destdest

Small changes will make router Aswitch exit points to dst

More robust to intradomainrouting changes

B

Important factors:- Location: relative distance to egresses- Day: which events happen

Impact of an OSPF Change

router Arouter B

BGP Reaction Time

Transfer delay

First BGP updateAll BGP updates

BGP scan

Transferring Multiple PrefixesC

um

ula

tive

Nu

mb

er

of

Ho

t-P

ota

to C

ha

nge

s

time BGP update – time LSA (seconds)

81 seconds delay

Data Plane Convergence

R1R2

dst

10

100 10111

E1 E2

Disastrous for interactive applications (VoIP, gaming, web)

2 – R2 starts using E1 to reach dst

1 – BGP decision process runs in R2

R1R2

dst

10

100 10111

E1 E2

3 – R1’s BGP decision can take up to 60 seconds to run

Packets to dst may be caught in a loop

for 60 seconds!

2 – R2 starts using E1 to reach dst

1 – BGP decision process runs in R2

BGP Updates Over PrefixesC

umul

ativ

e %

BG

P u

pdat

es

% prefixes

OSPF-triggered BGP updatesaffects ~50% of prefixesuniformly

prefixes with onlyone exit point

Avoiding Hot Potatoes

Reducing the Impact of Hot Potatoes

• Vendors: better router implementation– Avoid timer-driven reaction to IGP changes– Move toward an event-drive BGP

implementation

• Operators: avoid equal-distant exitsZ

10

10X

Y

Z

1000

1X

Y

dst dst

Small changes will make Z switch exit points to dst

More robust to intra-domainrouting changes

Reducing the Impact (Continued)

• Operators: new maintenance practices– Careful cost-in/cost-out of links

– (But, is this problem over-constrained???)

Z

X

Y

55

1010

10

dst

4

100

Is Hot-Potato Routing the Wrong Design?

• Too restrictive– Egress-selection mechanism dictates a

policy

• Too disruptive – Small changes inside can lead to big

disruptions

• Too convoluted– Intradomain metrics shouldn’t be so tightly

coupled with BGP egress selection

Strawman Solution: Fixed Ranking

• Goal: no disruptions from internal changes– Each router has a fixed ranking of egresses– Select highest-ranked egress for each

destination– Use tunnels from ingress to egress

• Disadvantage– Sometimes changing egresses would be useful– Harm from disruptions depends on application

AB

C

DG

EF4

5

39

34

108

8

A Bdst

Egress Selection Mechanisms

auto

mat

ic a

dapt

atio

n

robustness to internal changes

hot-potato routing

fixed rankingm(i,dst,e) = static rank(i,e)

m(i,dst,e) = d(i,e), d is intradomain distance

For each ingress, destination, egress:

TIE: Tunable Interdomain Egress Selection

• Flexible policies– Tuning and allows covering a wide-range

of egress selection policies

• Simple computation– One multiplication and one addition– Information already available in routers

• Easy to optimize– Expressive for a management system to

optimize

m(i,dst,e) = (i,dst,e) * d(i,e) + (i,dst,e)

Using TIE

• Decouples egress selection from IGP paths– Egress selection is done by tuning and

• Requirements– Small change in router decision logic– Use of tunnels

• Configuring TIE– Network designers define high-level policy– Network management system translate

policy into parameters

Example Policy: Minimizing Sensitivity

• Problem definition– Minimize sensitivity to equipment failures– No delay more than twice design time delay

• Simple change to routers– If distance is more than twice original

distance• Change to closest egress

– Else• Keep using old egress point

• Cannot change routers for all possible goals

Output of simulation phase

At design time: m(C,dst,A) < m(C,dst,B)

Minimizing Sensitivity with TIE

AB

C

dst

911

2010

9.(C,dst,A) + (C,dst,A) < 10.(C,dst,B) + (C,dst,B)11.(C,dst,A) + (C,dst,A) < 10.(C,dst,B) + (C,dst,B)20.(C,dst,A) + (C,dst,A) > 10.(C,dst,B) + (C,dst,B)

Optimization phase: solve integer programming

Evaluation of TIE on Real Networks

• Topology and egress sets– Abilene network (U.S. research network)– Set link weight with geographic distance

• Configuration of TIE– Considering single-link failures– Threshold of delay ratio: 2 [1,4] and 93% of (i,dst,e)=1 {0,1,3251} and 90% (i,dst,e)=0

• Evaluation– Simulate single-node failures– Measure routing sensitivity and delay

Effectiveness of TIE

• Delay– Within the 2x target whenever possible (i.e.,

whenever hot-potato could do it)– Lower delay than the fixed-ranking scheme

• Sensitivity– Only a little more sensitive than a fixed

ranking scheme– Much less sensitive than hot-potato routing

Conclusion

• Hot-potato routing– Simple, intuitive, distributed mechanism– But, large reaction to small changes

• Studying hot-potato routing– Measurement of hot-potato routing changes– Characterization of hot potatoes in the wild– Guidelines for vendors and operators

• Improving the routing architecture– Identify egress selection as its own problem– Decouple from the intradomain link weights

Next Time: Root-Cause Analysis

• Two papers– “Locating Internet Routing Instabilities”– “A Measurement Framework for Pin-Pointing

Routing Changes”• NANOG video

– “Root Cause Analysis of Internet Routing Dynamics”

• Review just of the first paper– Summary, why accept, why reject, future work

• Think about your course project– One-page written proposal by Thursday March

24– Final written report due Tuesday May 10

internet routing (cos 598a) today: hot-potato routing jennifer rexford jrex/teaching/spring2005...

Documents

hotpotato routing bgp

bgp routing changes

set hotpotato routing

unnecessary hotpotato

measuring hot potatoes

dest slide

hot potatoes frequency

protocols bgp