case studies on intra-domain routing instability
TRANSCRIPT
Case Studies inCase Studies inIntraIntra--Domain Routing InstabilityDomain Routing Instability
Zhang ShuZhang ShuNational Institute of Information and National Institute of Information and Communications Technology, JapanCommunications Technology, Japan
NANOG31NANOG31San Francisco, 2004/5/25San Francisco, 2004/5/25
OverviewOverviewIntraIntra--domain routing instabilitydomain routing instabilityMeasurements of intraMeasurements of intra--domain domain routing instabilityrouting instability•• WIDE Internet and APAN TokyoWIDE Internet and APAN Tokyo--XP XP
networknetwork
Dealing with intraDealing with intra--domain routing domain routing instabilityinstability•• Detection and troubleshootingDetection and troubleshooting
ConclusionsConclusions
IntraIntra--Domain Routing InstabilityDomain Routing InstabilityIntraIntra--domain routing instabilitydomain routing instability•• Unexpected routing changes within an IGP Unexpected routing changes within an IGP
routing domainrouting domain•• Causes packet loss, increased router load, Causes packet loss, increased router load,
and wasted bandwidthand wasted bandwidth
Why focus on intraWhy focus on intra--domain routing?domain routing?•• Compared with interCompared with inter--domain routing, domain routing,
research on IGP behaviors is still poorresearch on IGP behaviors is still poor•• Help operators better understand intraHelp operators better understand intra--
domain routing instability and learn how domain routing instability and learn how to deal with itto deal with it
Measurement MethodologyMeasurement MethodologyData collectionData collection•• OSPFOSPF•• TcpdumpTcpdump
Ethernet
OSPF cloud
Data collector
Measurement Methodology (ContMeasurement Methodology (Cont’’d)d)
Data analysisData analysis•• Counting routing changesCounting routing changes
Changes in the content of an LSAChanges in the content of an LSALSA flushLSA flushChanges in ASChanges in AS--External LSAs External LSAs were excludedwere excluded
•• Refreshing LSAs were not Refreshing LSAs were not countedcounted
Case Study 1/2: WIDE InternetCase Study 1/2: WIDE InternetWIDE InternetWIDE Internet•• WIDE Project (http://WIDE Project (http://www.wide.ad.jpwww.wide.ad.jp))•• Connects hundreds of academic Connects hundreds of academic
organizationsorganizations•• About 50 routers in the OSPF backbone About 50 routers in the OSPF backbone
areaarea
Data collected at NARAData collected at NARA--NOCNOC•• Located in Nara, JapanLocated in Nara, Japan•• Both OSPFv2 and OSPFv3 data collectedBoth OSPFv2 and OSPFv3 data collected
Measurement of the WIDE Internet Measurement of the WIDE Internet RouterRouter--LSALSA
Period: August 2000 – May 2004
Measurement of the WIDE Internet (ContMeasurement of the WIDE Internet (Cont’’d)d)
Network-LSA
Network-Summary-
LSA
ASBR-Summary-
LSA
Period: August 2000 – May 2004
Example of a Typical LSA OscillationExample of a Typical LSA Oscillation
Relatively frequent changes in short term• A router in Fukuoka (WIDE),
5/7/2004, lasted for about 4 hoursUsually caused by congestion
Example of Serious OscillationExample of Serious Oscillation
Frequent changes in short termFrequent changes in short term• An L3 switch, 6/12/03-6/13/03, lasted for
about 18 hoursObserved for several times• Most of them were caused by problems of
p2p links or misconfiguration of using the same router ID on two routers
LongLong--Term ChangesTerm Changes
Relatively frequent changes• A router in SF, lasted for 5 months (10/23/03-4/1/04)
Considered due to a switch problem
LongLong--Term Changes (ContTerm Changes (Cont’’d)d)
Slow changesSlow changes• A router in Kyoto, has persisted since
this MarchSome of them were caused by interface problems
The Case of OSPFv3The Case of OSPFv3
Period: July 2003 – January 2004
Case Study 2/2: APAN TokyoCase Study 2/2: APAN Tokyo--XPXP
APAN TokyoAPAN Tokyo--XP networkXP network•• A transit network located in TokyoA transit network located in Tokyo•• Relatively small in scale, with no Relatively small in scale, with no
more than ten routers in the more than ten routers in the backbone areabackbone area
Measurement of APAN TokyoMeasurement of APAN Tokyo--XP NetworkXP Network(OSPFv2, Router(OSPFv2, Router--LSA)LSA)
Problem of ATM link
Switch problemMisconfiguration
Period: August 2003 – May 2004
Causes of InstabilityCauses of InstabilityIdentified causesIdentified causes•• CongestionCongestion
DDoSDDoS
•• Link failureLink failure•• Software/Hardware bugSoftware/Hardware bug•• MisconfigurationMisconfiguration
Most instability is due to other Most instability is due to other reasons rather than routing protocol reasons rather than routing protocol problemsproblems
Analysis ResultsAnalysis Results
Observed Routing InstabilityObserved Routing Instability•• Instability observed on both the Instability observed on both the WIDE Internet and the APAN TokyoWIDE Internet and the APAN Tokyo--XP networkXP network
•• The most typical changes are The most typical changes are relatively frequent shortrelatively frequent short--term onesterm ones
Happen at intervals of 10 Happen at intervals of 10 -- 200s200s
•• Frequent shortFrequent short--term changesterm changes•• LongLong--term changesterm changes
Analysis Results (ContAnalysis Results (Cont’’d)d)
Changes is decreasingChanges is decreasing•• The change in routerThe change in router’’s implementations implementation•• Less network congestion because of the Less network congestion because of the
increased bandwidth in recent yearsincreased bandwidth in recent years
The causes of many changes are The causes of many changes are unknownunknown
Rtanaly: A Tool to Detect and Visualize Rtanaly: A Tool to Detect and Visualize IntraIntra--Domain Routing InstabilityDomain Routing Instability
FunctionsFunctions•• Detection of IGP change in realDetection of IGP change in real--time time
and alert operatorsand alert operatorsCan also be used for offline data analysisCan also be used for offline data analysis
•• VisualizationVisualization•• Accessible through the WWW interfaceAccessible through the WWW interface
Currently only supports OSPFCurrently only supports OSPF•• ISIS--IS support will be completed soonIS support will be completed soon
Troubleshooting Routing InstabilityTroubleshooting Routing Instability
Why is routing instability Why is routing instability troubleshooting difficult?troubleshooting difficult?•• Problems occur intermittently, so it is Problems occur intermittently, so it is
difficult to get useful datadifficult to get useful data for for troubleshootingtroubleshooting
EventEvent--driven data collectiondriven data collection•• Automatically obtain data for Automatically obtain data for
troubleshooting when detecting routing troubleshooting when detecting routing changeschanges
Troubleshooting Routing Instability (ContTroubleshooting Routing Instability (Cont’’d)d)
Data that should be collectedData that should be collected•• Traffic volumeTraffic volume•• Interface statusInterface status•• Information on the routing protocolsInformation on the routing protocols
From where?From where?•• The router that originated the changing The router that originated the changing
LSALSA•• Network equipment connected to the Network equipment connected to the
routerrouterSwitchSwitch
How to collect the data?How to collect the data?•• SNMPSNMP
ConclusionsConclusionsRouting instability measurementsRouting instability measurements•• IntraIntra--domain routing instability can domain routing instability can occur frequently and persistentlyoccur frequently and persistently
•• Similar phenomenon may occur on Similar phenomenon may occur on other networksother networks
It is important to deploy a monitoring It is important to deploy a monitoring system on your own networksystem on your own network
RtanalyRtanalyTroubleshootingTroubleshooting•• EventEvent--driven data collectiondriven data collection
Acknowledgements
My thanks toMy thanks to•• WIDE Project and Nara Institute of WIDE Project and Nara Institute of
Science and TechnologyScience and Technology•• Operators of APAN TokyoOperators of APAN Tokyo--XP networkXP network•• Prof. Prof. YoukiYouki KadobayashiKadobayashi for the idea on for the idea on
troubleshootingtroubleshooting
IntraIntra--domain routing stability domain routing stability measurement projectmeasurement project• http://pe0.koganei.wide.ad.jp/rtanaly
Please contact us if you are Please contact us if you are interested in conducting an IGP interested in conducting an IGP measurement on your networkmeasurement on your network•• [email protected]@koganei.wide.ad.jp
Thank you!Thank you!