rcat final version v2 - nanog archive · 2008. 7. 21. · as8452 - tedata (37 prefixes impacted)...

36
Rcat :: Root Cause Analysis Tool Anthony Lambert [email protected] Mickael Meulle [email protected] Marc-Olivier Buob [email protected]

Upload: others

Post on 20-Feb-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Rcat ::Root Cause Analysis Tool

Anthony Lambert

[email protected]

Mickael Meulle

[email protected]

Marc-Olivier Buob

[email protected]

Page 2: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

� Rcat in a nutshell

– Getting familiar with the tool and its principles

� Case study 1: January Mediterranean Cable Break

– Getting some confidence in Rcat results

� Case study 2: A tiny not so tiny event

– Maybe the most interesting feature of Rcat

Today's presentation agendaToday's presentation agendaToday's presentation agendaToday's presentation agenda

Page 3: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Rcat in a nutshell

Page 4: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

RcatRcatRcatRcat in a nutshellin a nutshellin a nutshellin a nutshell

� Rcat is based on the method we presented last year at Nanog40:

– Revisiting Revisiting Revisiting Revisiting InterdomainInterdomainInterdomainInterdomain Root Cause Analysis from Multiple Vantage PointsRoot Cause Analysis from Multiple Vantage PointsRoot Cause Analysis from Multiple Vantage PointsRoot Cause Analysis from Multiple Vantage Points, Anthony

Lambert, Mickael Meulle, Jean-Luc Lutton, NANOG-40, June 2007

� As promised Rcat is publicly available at: http://rcat.rd.francetelecom.com/

� Rcat analyzes BGP announcements sent by route-views eBGP peers,

so as to determine which ASs are the more likely to have originating the

inter domain structure changes which have lead to the emission and

spread of the BGP announcements collected.

� Rcat aims at helping NOCs providing them with:

– an increased reactivity during outages

– an increased proactivity, detecting small recurrent events for instance

Page 5: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

RcatRcatRcatRcat in a nutshellin a nutshellin a nutshellin a nutshell

AS20858 is double circled meaning we are considering the set of prefixes originated

by AS20858

� For every router connected to the trace collector, one knows at every

moment the AS path it uses to join any prefix.

� Observing the behavior of these paths, it appears that every router has a

preferred path to join any prefix pppp over time: The "primary path" to pppp.

� It also appears that most of the time a source router uses the same

primary path to join all the prefixes originated by a given AS.

router in the AS6730 establishing an eBGP

session with the routeviews trace collector

at LINX

Page 6: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

RcatRcatRcatRcat in a nutshellin a nutshellin a nutshellin a nutshell

XWhat if this "link" fails?

� After some time, every router connected to the trace collector should

announce a new path to join prefixes originated by AS20858 or at least

withdraw its primary path.

� From our point of view, the primary paths used to join these prefixes become

unavailable ... or said in another way the origin AS20858 tree is fading.

� Rcat can be seen as a very big state machine that keeps track for every

primary path of its state: available or unavailable and correlates primary paths

unavailabilities so as to extract the underlying events

Page 7: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Search period

time zone value

logical query:•events caused by ASx: orix•events which have impacted a path used by ASx: asx•events which have impacted prefixes originated by ASx: treex•events which have impacted prefix pref: p=pref•events which have impacted prefixes more specific than pref: p<pref•events which have impacted prefixes less specific than pref: p>pref

•OR ANY LOGICAL COMBINATION OF THEM ... USING & (and) , | (or) and as many parentheses levels as desired ...

choice of the collector

various options

about size or multiplicity of

the event you search

RcatRcatRcatRcat in a nutshellin a nutshellin a nutshellin a nutshell

Page 8: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

For each event

satisfying the options set,

Rcat displays a thumbnail,

basic information

and the list of occurrences of this events

during the search period

When clicking on

the occurrence

date you are interested in, Rcat

displays the detail for

this occurrence

Page 9: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

event picture: graph of all the primary paths that have been

impacted by the event

information about the

size and the multiplicity

of the event (number of times the event has

occurred in the month)

inferred originators of

the event

list of all the origin AS trees impacted by the

event

Page 10: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

For each impacted origin AS, Rcatdisplays the primary paths each

router was using to join the different impacted prefixes

When clicking on a prefix, Rcatdisplays the

path exploration the routers

have undergone during the

event

RcatRcatRcatRcat in a nutshellin a nutshellin a nutshellin a nutshellRcat also points out at what time these paths have

become unavailable

Page 11: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

RcatRcatRcatRcat in a nutshellin a nutshellin a nutshellin a nutshell For each router connected to the trace collector, one can see the paths it has explored during the event and the unreachability periods it has undergone

Page 12: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Case study 1:

January Mediterranean

Cable Break

Page 13: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable Break

Focusing on Egypt, one of the harder-hit countries, here is the challenge:

� "Redrawing" Renesys' unreachability curve using Rcat results

� Explaining the different peaks in the curve

Outages over time for Egypt (Outages over time for Egypt (Outages over time for Egypt (Outages over time for Egypt (RenesysRenesysRenesysRenesys sources)sources)sources)sources)

� The cable break was very well documented by Renesys on their blog at:

– http://www.renesys.com/blog/2008/01/mediterranean_cable_break.shtml

Page 14: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable Break

24835248352483524835

15412 15412 15412 15412

8452845284528452

24835 24835 24835 24835

8452845284528452

5536 5536 5536 5536

5511 5511 5511 5511

8452845284528452

15412 15412 15412 15412

6762 6762 6762 6762

701701701701

15412 15412 15412 15412

701701701701

ASNASNASNASN

RAYARAYARAYARAYA----ASASASAS

FLAGFLAGFLAGFLAG

TEDATATEDATATEDATATEDATA

RAYARAYARAYARAYA----ASASASAS

TEDATATEDATATEDATATEDATA

Internet EgyptInternet EgyptInternet EgyptInternet Egypt

OPENTRANSITOPENTRANSITOPENTRANSITOPENTRANSIT

TEDATATEDATATEDATATEDATA

FLAGFLAGFLAGFLAG

SEABONESEABONESEABONESEABONE

UUNETUUNETUUNETUUNET

FLAG FLAG FLAG FLAG

UUNETUUNETUUNETUUNET

Upstream Upstream Upstream Upstream

providersprovidersprovidersproviders

15475154751547515475553655365536553620858208582085820858845284528452845224863248632486324863ASNASNASNASN

Nile OnlineNile OnlineNile OnlineNile OnlineInternet EgyptInternet EgyptInternet EgyptInternet EgyptEgyNetEgyNetEgyNetEgyNetTEDATA TEDATA TEDATA TEDATA LINKdotNETLINKdotNETLINKdotNETLINKdotNETEgyptian Egyptian Egyptian Egyptian

providersprovidersprovidersproviders

� A cable breakdown can be seen as a multiple "link failures" between the

ASs that peer through transmission link supported by the cable.

� Under our formalism, the cable breakdown should correspond to many

origin ASs trees fading.

� We should then obtain in Rcat different events originated either by the

regional providers, or their providers

� According to Renesys and some other data sources, here are the

Egyptian providers and their upstream providers:

Page 15: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable Break

� More precisely we should observe events either caused by the Egyptian

providers or caused by one their upstream providers and which have

impacted the prefixes the Egyptian providers originate.

� For instance, for Nile Online the corresponding logical query is:

ori15475 | (tree15475 & (ori15412 | ori24835 | ori8452))

events

caused by

Nile Online

events caused by a provider of Nile

Online and impacting the prefixes

Nile Online orignates

Page 16: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable Break

� The final Rcat logical query to get all the Egyptian events is therefore:

(ori15475 | (tree15475 & (ori15412 | ori24835 | ori8452))) |

(ori5536 | (tree5536 & (ori24835 | ori8452)))|

(ori20858 | (tree20858 & (ori5536 | ori5511 | ori8452)))|

(ori8452 | (tree8452 & (ori15412 | ori8697 | ori6762 | ori701)))|

(ori24863 | (tree24863 & (ori15412 | ori701)))

Page 17: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Rcat has found 6 events

satisfying the previous

logical query

The search period is set to:

2008-01-30 04:30:002008-01-30 16:00:00

We only want events which have impacted many

announced paths

Page 18: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Date of the occurenceis near the first peak

Date of the occurenceis near the first peak

Date of the occurenceis near the

second peak

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable Break

Page 19: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Date of the occurenceis near the fourth peak

Date of the occurenceis near the third peak

Date of the occurenceis near the third peak

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable Break

Page 20: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Unreachability period for prefixes originated by:AS20858 - EGYNET (174 prefixes impacted) AS5536 - Internet-Egypt (52 prefixes impacted)

AS21152 - SOFICOM (32 prefixes impacted)AS31619 - CITYSTARS (3 prefixes impacted)AS25576 - AFMIC (1 prefix impacted)

Small unreachability period for prefixes

originated by AS5536 -Internet-Egypt

Prefixes originated by AS5536 - Internet-Egypt

find a path back

Using Rcat details about this event

(impacted prefixes for each origin AS, path

exploration for each impacted prefix), we

can draw the corresponding unreachability

curve and give explanations about it:

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakN

um

ber

of

un

reac

hab

le p

refi

xes

Unreachability caused by event RV_LINX_2008_01_18643

Dates

Small unreachability period for prefixes

originated by AS5536 -Internet-Egypt

Page 21: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Long term unreachability period for prefixes originated by:

AS8452 - TEDATA (37 prefixes impacted)

AS15706 - Sudatel (29 prefixes impacted)

Using Rcat details about this event

(impacted prefixes for each origin AS, path

exploration for each impacted prefix), we

can draw the corresponding unreachability

curve and give explanations about it:

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable Break

Dates

Nu

mb

er o

f u

nre

ach

able

pre

fixe

s

Unreachability caused by event RV_LINX_2008_01_18685

Page 22: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Small unreachability period for prefixes originated by:

AS8452 - TEDATA (118 prefixes impacted)AS15475 - Nile Online Giza (61 prefixes impacted)

AS20484 - YallaOnline (49 prefixes impacted)

AS6879 - EUnet Egypt (15 prefixes impacted)AS20928 - Noor Advanced Technologies ASN Cairo (13 prefixes impacted)

Another unreachability period for some prefixes originated by AS15475 - NOL

Using Rcat details about this event

(impacted prefixes for each origin AS, path

exploration for each impacted prefix), we

can draw the corresponding unreachability

curve and give explanations about it:

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable Break

Dates

Nu

mb

er o

f u

nre

ach

able

pre

fixe

s

Page 23: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Long term unreachability period for prefixes originated by:

AS15412 - FLAG (28 prefixes impacted)AS36870 - IT-WORX (1 prefix impacted)

AS24835 - RAYA (33 prefixes impacted)

Using Rcat details about this event

(impacted prefixes for each origin AS, path

exploration for each impacted prefix), we

can draw the corresponding unreachability

curve and give explanations about it:

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable Break

Dates

Nu

mb

er o

f u

nre

ach

able

pre

fixe

s

Page 24: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Long term unreachability period for prefixes originated by:

AS24863 - LINKdotNET ( 344 prefixes impacted - but only a part becomes unreachable)

Using Rcat details about this event

(impacted prefixes for each origin AS, path

exploration for each impacted prefix), we

can draw the corresponding unreachability

curve and give explanations about it:

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable Break

Dates

Nu

mb

er o

f u

nre

ach

able

pre

fixe

s

Page 25: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Using Rcat details about this event

(impacted prefixes for each origin AS, path

exploration for each impacted prefix), we

can draw the corresponding unreachability

curve and give explanations about it:

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable Break

Dates

Nu

mb

er o

f u

nre

ach

able

pre

fixe

s

Unreachability period for prefixes originated by:AS15475 - Nile Online Giza (61 prefixes impacted)

AS20928 - Noor Advanced Technologies ASN Cairo (13 prefixes impacted)

AS15804 - AS of The Way Out Internet Solutions Cairo (9 prefixes impacted)AS25364 - EgyptCyberCenter-AS (2 prefixes impacted)

AS31619 - City Stars Egypt (2 prefixes impacted)

...

Page 26: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable Break

� So plotting the total unreachability caused by these events, we get:

Dates

Nu

mb

er o

f u

nre

ach

able

pre

fixe

s

Total unreachability caused by the "Egyptian" events

Page 27: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Case study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable BreakCase study 1: January Mediterranean Cable Break

� So, we indeed succeeded in rebuilding Renesys' curve using Rcat

results, explaining what the different peaks were corresponding to.

� Now, let's see the details Rcat provides for the last event (for instance).

� If we compare with results presented by Renesys we obtain:

Dates

Nu

mb

er o

f u

nre

ach

able

pre

fixe

s

Total unreachability caused by the "Egyptian" events

Page 28: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

SEABONE

Nile Online Giza

Internet-Egypt

TEDATA

EGYNET-AS

Rcat had pointedout the link

between TEDATA and SEABONE as the location of the

event

Page 29: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Impact of the event for the Nile Online Giza's prefixes

For every prefix, Rcat displays all the impacted primary paths and specifies when they became unavailable

Page 30: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Clicking on a prefix we see the path exploration undergone by the routers

Pointing out the unreachability period

Page 31: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Case study 2:

A tiny not so tiny event

Page 32: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

A small event at first sight:

just one prefix impacted ...

just 4 routers loosing their

primary paths ...

Case study 2: Case study 2: Case study 2: Case study 2:

a tiny not so tiny eventa tiny not so tiny eventa tiny not so tiny eventa tiny not so tiny event

Page 33: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

Fast reconvergence: the event lasts from 1 to 24 second depending on the router observed

Neglectable unreachability: at most 2 seconds

Yes, but ...

Case study 2: a tiny not so tiny eventCase study 2: a tiny not so tiny eventCase study 2: a tiny not so tiny eventCase study 2: a tiny not so tiny event

Page 34: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

This event has occured10757 times in february

2008, so on average

an occurrence every 4 minutes

Consequently, every 4 minutes, the routers loose

their primary paths, run their decision process,

elect new path and reconverge to their primary

path few seconds after

A pretty useless stress which should not exist

Case study 2: Case study 2: Case study 2: Case study 2:

a tiny not so tiny eventa tiny not so tiny eventa tiny not so tiny eventa tiny not so tiny event

Page 35: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

Orange Labs - Research & Development - Rcat – Friday, March 14, 2008

20% of the total number of occurences are caused by

events occuring once

50% of the total number of occurences are caused by events occuring more than

30 times

20% of the total numberof occurences are causedby events occuring more

than 700 times

� Fixing events which occur so many times would reduce tremendously

the rate of BGP updates and thus the stress on routers.

� Rcat can point out such events, allowing network operators to find out

why they occur and to fix them.

Number of occurrences (log scale)

To

tal n

um

ber

of

even

t o

ccu

rren

ces

(per

cen

tag

e )

Distribution of event occurrences given their multiplicity from 2008-02-01 to 2008-02-29 (CCDF)

Page 36: Rcat final version v2 - NANOG Archive · 2008. 7. 21. · AS8452 - TEDATA (37 prefixes impacted) AS15706 - Sudatel (29 prefixes impacted) Using Rcatdetails about this event (impacted

thank you