bgp, the good bad, and uglymissing - manginthomas.mangin.com/data/pdf/netmcr - oct 12 - bgp good bad...
TRANSCRIPT
BGP,TheGood,TheBad,andTheUgly Missing
IdeastoimproveBGP
ThomasManginNetMCR – Oct2017
TL;DR
1. ShowhowBGPwas compactonthewireandmemoryfriendly2. Pointsomeminorweirdness/quirkiness
ExplainhowsuccessiveRFCruinedBGPand/ordidnotimprovethings3. Trytolookforwardatwaysonhowthiscouldbefixed4. ExplainwhythisisveryunlikelytohappenattheIETF
Ultimately,arguethatBGPneed”fixing”(oranewprotocolisneeded)bytheindustryinthehopesomeonewithmoney,timeandskillsislisteningsomewhereanddecidetohelp.
TheProtocol IknowBGPFu
Bytheendoftheday…YouwillbeabletoreadBGP…without usingWireShark (orperhapsnot)
• TCPport179• Easytocode• worksthroughNAT!!Good• TCPsessionfailuredetectionisvery,very,LONG… RST?• hencea“convoluted”protocolheartbeatmechanismBad
• Tricks• Therearequiteafew“undocumented”behaviourslikeAlcatelusingaTCPwindowsizeofzerototellspeakersthatnoCPUtimeisavailableandthatpeersshouldnotsendUPDATEsanymore.
Bad
“Layer2”Connection
• SimplebinaryTLV..• Binary,compact&OOfriendly• OldschoolGood
• ManyTLV,orLVT,orLV,or..• Everydraftre-inventedaTLVvariant• NochanceinhelltogetthatfixedBad
Framing
• Maximummessage• 4kshouldbeenoughforeveryone..• DesignforRAMcontraintedsystems• 4kisaUNIXpagesize(easyallocation)• Hardcodedinthedraft,notpacketGood then Badnow
Onedraftlingeringforyeartryingtotheraisethelimitto65k..(finallyseriouslyconsidered).Missing,butthereishope...
MandatorySci-Fireference(ADalekfromStarTrek)
ThisisaBGPHeader.
IntroducedwithBGPv3(likev6comesafterv4)inOctober1991
ToeraseBGP“v1”headers,notchanged/fixedsinceBad
Framing
Lengthfirst,
ItallowstoputthepacketcontentinmemorywithonereadGood
Nosimplewaytoupgradeitto32bitsbychangingtheMESSAGETypeBad
Framing
MessageTypeCode1. OPEN2. UPDATE3. NOTIFICATION4. KEEPALIVE
• MessageTypeOrder1. OPEN2. KEEPALIVE(s)3. UPDATE4. NOTIFICATION
Cannotseeanylogicinthenumbering…ItdoesnotmatterunlessyouhaveveryacuteOCD
ThisslideisheretotinkleyourOCD..
BGPIdentifieraka“RouterID”NotanIPv4:anASNuniqueID(“linked”totheOSPFRouterID)
NotIPv6onlynetworkfriendlyHardtoforesee20yearsagoButapainforv6onlynetworks
Huaweitriedtochangethisandfailed.
OPEN
MinimumHoldTime is3(or0fordisabling)
“KEEPALIVE”HeartbeatmessageseveryHoldTime/3(shouldbethetimervaluehere)
Besttimeforfailuredetectionis3seconds.… abit slownowdaysBad
OPEN
Open–>Negotiation–>
• ASNarenot16bitsanymore• Caused“transitivesessionsdrop”Bad• Allfixedso“ok”..Good
• Explicitversioninheader• Everyimplementationchecksit• Wonderwhy,wehavethemarkerGood
• “Capabilities”negotiation• ItiswhatallowedBGPtoevolve• AndhavepartialfeatureimplementationGood• SizeconstraintslowlyshowingBad
• Anythingrecentis“negotiated”• 32bitsASN• Family(IPv6,VPN,FlowSpec,EVPN,…)• Add-Path
UPDATE
• NLRIencoding• IPv4isVERY spaceefficientGood
• Multiprotocolafterthought(ie:IPv6)• AIPv6NLRIisanattribute!What!• ONEannouncement&withdrawBad• ThepackingisnowVERY wasteful!
ThisisaBGPUPDATE
WecouldspeakatlengthaboutUPDATE“attributes”,buttheyare“ok”
Let’sskiptheirweirdencoding(7or15bits)AStransitivity stillscare somepeople.
Theyarehardtoexplaininquicktalk.ButfundamentaltoBGPdesign
UPDATE
Nice,Simple,Compact!Justsimplifieda“bit”hereforclarity!(noPathAttribute)
Lovely packing, now feeling nostalgic about other “good old” binary format such as IFF, later PNG
UPDATE
• Attributesareakitchensink• EveryBGPnewfeatureisanattributeVeryverybad• Easiercodetochangebyvendors
• UPDATEgenerationcodeisCOMPLEX• Havetobreakevery4kBad
• ManyissuesfixedinrecentRFC• ordering,reliability,…Good
Mandatory“cute”kitten
WhereistheLATENCY usedwithBGP…
Missing
AttributeMESSAGE,ideas!
• SeparationofAttributesandNRLIparsing• DissociateAttributesandUpdates• Sameattributesareparsedandparsedagain• MostoftheBGPparsingisattributesTerriblyBad forIPv6– JustveryBad forIPv4
• NewMESSAGEforattributesinformation?• CPU+bandwidthvsMemory/Caching• Memorynottheweakestlinktoachievegoodconvergence• RemovethedefinitionfromtheUPDATE,CreateanewMESSAGE• Reference“Attributes”MESSAGEinUPDATE(saveLOTofparsing)
AttributeMESSAGE,ideas!
• Alsoallowattributecomposition?• ThisishowrouterconfigurationsarebuildonmodernCLI• Manycommunitiesareused:
Tosethigh/lowlocal-prefToremoveRFC1918,Todroptraffic,Toslicebread,…
• Around95%ofroutesintheDMZhaveuniqueAS_PATH• HavingtheAS_PATHpartofthegroupingissub-optimal• ItmaymakesensetomoveAS_PATHwithNLRI• Norealpersonalresearchonattributegrouping
UPDATEMESSAGE,ideas
• A“route”isreallyaNLRI&anext-hop• Attributesareforrouteselection• Groupingnext-hopwithotherattributedataissub-efficientBad
• Itdoesmakesensetogroupbynext-hop• Butnext-hopnotreallyan“attribute”Splitnext-hopfromtheotherattributesandgroupNLRIpernext-hops
• Noneoftheideaspresentedchangetherouteselectionprocess
UPDATE(2)MESSAGE,moreideas
• WhynotcreateanewMESSAGEtypeforMultiprotocol• Keepingthesameformatforattributes(improvedornot)
• JustdifferentNLRIencoding(notconsideringAS_PATH)• AFI/SAFI• MPwithdraw• Attributes(currentformatwithproposedidea)• Next-hop+setofMPannounce,• Next-hop+setofMPannounce,…
(Orhaveanattributeand/orcapabilitytosignalachangeofNLRIparsing)
Disclaimer:Thechanceofseeingtheseideashappenis(near)zeroButpleasefeelfreetoshowmewrong!
Finally,anagreementwasreachedonastandardchange
Thisis/wasanopinionatedtalk… Iamrightandeveryoneelseiswrong
BGP,meansIETF
• Vendorsareveryinfluent• Theypaypeoplewhocodethething• Theylistento$$$clients• BGPismadeby10sand10sofRFCs• Usefuldraftsinlimboforyears• Lotsofpolitics(likeeverywhere)• by“specwriters”not“programmers”
canleadtosomeweirdstuff
• Veryfewoperators• Mostlyonlylargenetworks• Notenoughoperatorfeedback• NotenoughoperationalfeedbackBad
NooneinterestedinfixingBGP,LikeHTTP/Bis fixedHTTP
https://waitbutwhy.com/2016/03/doing-a-ted-talk-the-full-story.html
Youarehere..YESYOUARE.
AndIamlookingforwardtoseatdown..ButImay havespokentoofast
25slidesfor15minutesshouldbearoundgood
Emergencyextraslides?Wantmore?Questions?
Extraslides??
BGP/StateMachine
StateMachine
• ShouldmakesthingsclearinRFC4271..Should..• Veryhardto“get”(puttingcodeideasinwordsishard)• Mostdiagramsofitarewrong,inawayoranother• NootherRFCdoesreallyupdatethestatemachine(whentheysometimesshould)
• Mostimplementationsdonotimplementitfully/correctly• Tryto“suggest”animplementation(s)oftheBGPreactor(try/exceptcanachievethesamewithoutit)
Good /Bad … Pickone!
BGPOther
• KEEPALIVE• 3needtogomissingtoconsiderthepeerdead
• NOTIFICATION• Notificationofissues/sessiongoingaway• Jobworkedonthis:-)
• EmptyUPDATE• KnownasEOR(ie:youcannowsynctheRIBtotheFIB)• MultiProtocol IPv4vsIPv4“native”– interopissuesinthepast(resolved)
• 2xKeepAlive• Samebutitisatrick..Notdocumentedanywhere
Becareful,GooglingBGPcanbesurprising…