gpf 2.0 2007 failover matrices–cariden 1 peering planning cooperation without revealing...

21
GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes [email protected] www.cariden.com (c) cariden technologies GPF 2.0 March 25-30, 2007 Miami, Cozumel & Belize

Upload: raymond-gordon

Post on 20-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 1

Peering Planning Cooperation without Revealing Confidential Information

Aaron [email protected]

www.cariden.com (c) cariden technologies

GPF 2.0 March 25-30, 2007Miami, Cozumel & Belize

Page 2: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 2

SEA CHI BOS

NYC

WDC

MIA

ATLHST

LAX

SJC KCY

Peer X

700

600

X

SEA CHI BOS

NYC

WDC

MIA

ATLHST

LAX

SJC KCY

Peer X

100

Mbps

1200

(con

geste

d)

X

The Issue

• Multi-Homed Neighbor, 2 or more links > 50%• Example

– 1000Mbps connections to Peer X in 3 locations– SJC-to-Peer = 600Mbps, NYC = 100, WDC = 600– SJC-to-Peer link fails

• Are we in trouble?

? ?... or ...

Page 3: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 3

Capacity Planning Utopia

• Uniform capacity links

• Diverse connections(unlikely double failures at Layer 3)

• Upgrade at 50%(planning objective is to be resilient to single failures)

Page 4: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 4

• Range of capacities• Multiple Layer 3 failures• Upgrade impediments (money, cable plant, ...)

Capacity Planning Reality

Page 5: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 5

IGP Different from BGP

• Data is more accessible

• Failure behavior is predictable

• Established process for within AS planning– Gather Data

• Topology (OSPF, IS-IS, ...)

• Traffic matrix [1]

• Estimate growth

– Simulate for failures

– Perform traffic engineering (optional)[2]

– Upgrade as necessary

• Commercial and free tools

Page 6: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 6

• Data is larger and harder to access

• BGP decision process complicated

• Planning practices not well established

• Failure behavior often depends on someone else’s network!

– e.g., incoming traffic from a peer

The Trouble with BGP

subject of this talk

Page 7: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 7

BGP Path Decision Algorithm[1]

1. Reachable next hop2. Highest Weight3. Highest Local Preference4. Locally originated routes5. Shortest AS-path length6. IGP > EGP > Incomplete7. Lowest MED8. EBGP > IBGP9. Lowest IGP cost to next hop10. Shortest route reflection cluster list11. Lowest BGP router ID12. Lowest peer remote address

[1] Junos algorithm shown here. Cisco IOS uses a slightly different algorithm.

Shortest Exit Routing

Respect MEDs

Page 8: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 8

Common Routing Policies

• Shortest Exit– Often used for sending to peers– Get packet out of network as soon as possible– Local Prefs used to determine which neighbor,

IGP costs used to determine which exit

• Respect MEDs– Often used for customers who buy transit– Deliver packets closest to destination– Neighbor forwards IGP costs as MEDs

(multi-exit discriminators)

Page 9: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 9

Relationship to Remote AS

Routing ToRemote AS

Routing FromRemote AS

PeerShortest Exit in known network

Shortest Exit in unknown network

CustomerRespect MEDs from unknown

Shortest Exit in unknown network

Transit Provider

Shortest Exit in known network Respect our MEDs

Blind Spots

• Cannot predict behavior when routing depends on other network (see 3 cases below).

Page 10: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 10

Failover Matrices

• Solution to peering planning blind spots

• Procedure– Gather data

• Topology, Traffic, Routing Configurations

– Simulate knowable effects • Generate Failover Matrices

– Share Failover Matrices for unknowables• e.g., peer gives failover matrix for traffic it delivers, we

provide peer failover matrix for traffic we deliver

• Both sides benefit from cooperating

• AS-Internal information is kept confidential

Page 11: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 11

Failover Matrix Example

Node:Interface Traffic:no failure

%Traffic:fail_SJC

%Traffic:fail_nyc

%Traffic:fail_wdc

ar1.sjc:Gig3/2 600 - 10% (610) 1% (606)

ar1.nyc:ge-2/1 100 48% (388) - 95% (670)

ar2.wdc:ge-2/2 600 52% (912) 70% (670) -

Note: 388Mbps=100Mbps+(0.48*600Mbps), 912=600+(0.52*600), ...

Page 12: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 12

Failover Example (from real network)

No Scale

Peer Circuit 1: Traffic levels at five minute intervals

No Scale

Peer Circuit 2: Traffic levels at five minute intervals

No Scale

Peer Circuit 3: Traffic levels at five minute intervals

No Scale

Peer Circuit 4: Traffic levels at five minute intervals

• Circuit 2 fails.Traffic shifts to circuit 4.

Page 13: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 13

Failover Example (from real network)

No Scale

Peer Circuit 1: Traffic levels at five minute intervals

No Scale

Peer Circuit 2: Traffic levels at five minute intervals

No Scale

Peer Circuit 3: Traffic levels at five minute intervals

No Scale

Peer Circuit 4: Traffic levels at five minute intervals

• Circuit 1 fails. Some traffic shifts to 2 & 4 • Some “leaks” to other AS’s

Page 14: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 14

Questions

• How do I calculate a failover matrix?

• How do I use a failover matrix from a peer?

• What if my peer does not cooperate?

• What if a substantial amount of traffic “leaks” to another AS?

Page 15: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 15

Calculating Failover Matrices

• Accurate and Detailed[1,2]

– Per prefix routing and traffic statistics– Full BGP simulation

• Simple and Good Enough[3]

– Traffic matrix based on ingress-egress pairs• e.g., Peer1.LAX-AR1.CHI (measure and/or estimate)

instead of 192.12.3.0/24-208.43.0.0/16

– Limited simulation model• Shortest Path, Respect MEDs• “Our” AS plus immediate neighbors

[1] “Modeling the routing of an Autonomous System with C-BGP,” B. Quoitin and S. Uhlig, IEEE Network, Vol 19(6), November 2005.[2] “Network-wide BGP route prediction for traffic engineering,” N. Feamster and J. Rexford, in Proc. Workshop on Scalability and Traffic Control in IP Networks, SPIE ITCOM Conference, August 2002.[3] Cariden MATE, available at http://www.cariden.com.

Page 16: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 16

Using Failover Matrix from Peers

• Peer calculates failover matrix• Peer exports failover matrix

using IP addresses of peering links

• We import failover matrix• We include in a representative

model of peer network• Use Failover Matrix in

simulation

Page 17: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 17

Estimate if Peer not Cooperate

Group own sources based on exit location (4 groups here)

bru1

wsh1

wdc2

tpe3

syd1

snv2

sna1

sfo1

sel2

sea1

sac1

roc1

pty1

phx1

phi1

pao2

osa1

ewr1

nyc2

nrt4

nrt1

nqt1 min1

mia1

mex1

man1

lon3

lin1

lba1

lax1

kcy1

jfk1

hou1

hkg1

gru1

fra2

eze1

det1

den2 dal1

cph1

cle1

chi1

cdg2

bos1

bhx1 bbs1

atl1

ams2

ams1

ham1

sin1

tpa1

kul1

sjc2

bru1

wsh1

wdc2

tpe3

syd1

snv2

sna1

sfo1

sel2

sea1

sac1

roc1

pty1

phx1

phi1

pao2

osa1

ewr1

nyc2

nrt4

nrt1

nqt1 min1

mia1

mex1

man1

lon3

lin1

lba1

lax1

kcy1

jfk1

hou1

hkg1

gru1

fra2

eze1

det1

den2 dal1

cph1

cle1

chi1

cdg2

bos1

bhx1 bbs1

atl1

ams2

ams1

ham1

sin1

tpa1

kul1

sjc2 Quantify shift (to 3 groups) after failure Assume similar for other side

• Valid if topology and traffic distributions are similar

Page 18: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 18

Leaks to Other AS’s

• Simple option– Leaks between peers relatively small

• Ignore

– Shifts between transit providers can be large• Equal AS-path length to most destinations:• Assume complete shift (easy to model)

• Accurate option– Extend model to more than one AS away– Add columns in traffic matrix to designate extra

traffic in case of other network failures

INTERNETA

B

x

Page 19: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 19

Work in Progress

• Evaluating goodness of models– Compare actual failures to models

• Evaluating goodness of failover estimates– Work with both sides of a peering arrangement,

compare failover estimates to simulations– Compare estimated failover matrices to actual

failures

• Streamlining sharing of information

• Looking for more participantsContact me to participate in the above

Page 20: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 20

Summary

• Peering/transit links are some of the most expensive and difficult to provision links

• We can improve capacity planning on such links by modeling the network

• BGP modeling can be much more complex than IGP modeling– Some required information is not even available

• Failover Matrices provide a simple way to share information without giving away details

• Failover Matrices can be estimated using one’s own network details

Page 21: GPF 2.0 2007 Failover Matrices–Cariden 1 Peering Planning Cooperation without Revealing Confidential Information Aaron Hughes ahughes@cariden.com

GPF 2.0 2007 Failover Matrices–Cariden 21

Acknowledgments

• Jon Aufderheide (Global Crossing)• Clarence Filsfils (Cisco)