1 explicit marking and prioritized treatment of specific ospf packets for faster convergence and...
TRANSCRIPT
1
Explicit Marking and Prioritized Treatment of Specific OSPF Packets for Faster Convergence and Improved
Network Scalability and Stability (draft-ietf-ospf-scalability-02.txt)
Gagan ChoudhuryAT&[email protected]
Vishwas ManralNetPlane Systems
Anurag MaunderSanera Systems
Vera Sapozhnikova AT&[email protected]
2
The Basic Issue
• In Large Operational Networks Running Link-State Protocols we have Often Observed Sustained CPU Congestion (Often Memory Congestion as well) Caused by LSA Storms Triggered By– Links/Nodes Failures
– Synchronization of Refreshes
– Software Bugs or Procedural Errors
• Congestion Reinforced by Positive Feedback Loop due to– LSA Retransmissions, possible packet droppings, possible link failures
due to missed Hellos and eventual recoveries More LSAs
• On Rare Occasions the Congestion Spreads to Many Nodes and Cause Significant Failures
• We Propose Prioritization of Hello, LSA Acknowledgment Packets to improve Network Stability and Scalability
• Prioritized Treatment may be facilitated by Special Marking
• “Smart” Proprietary Implementations are perhaps already doing it but we propose them as Best Current Practices so that all implementations benefit from it
3
Simulation Study
• Three Priority Scenarios– 1. Incoming LSUs, Hellos, LSA Acks at the Same Priority
– 2. Hellos have Priority over LSUs and LSA Acks
– 3. Hellos and LSA Acks have Priority over LSUs
• Network Scenarios: – Network 1: 100 Nodes, 1200 Links, Max Node Adjacency 50
– Network 2: 50 Nodes, 600 Links, Max Node Adjacency 48
• LSA Scenarios– 1 Router LSA per Node, 1 TE LSA per Link
– 1 Router LSA per Node, 10 ASE LSAs per Every Other Node
• LSA Retransmission Timer Value: 5 Seconds or 10 Seconds
• LSU Processing Time : ~ 1 ms, ~0.5 ms
• Hello/Router-Dead Interval: 10 Sec/40 Sec, 2 Sec/8 Sec
4
Six Simulation Cases
• Case 1: Network 1, Link LSAs, Retransmission Timer = 10 Sec, Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec.
• Case 2: Network 1, ASE LSAs, Retransmission Timer = 10 Sec, Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec.
• Case 3: Network 1, Link LSAs, Retransmission Timer = 5 Sec, Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec.
• Case 4: Network 1, Link LSAs, Retransmission Timer = 10 Sec, Proc. Time ~ 0.5 ms, Hello/Router-Dead-Interval = 10/40 Sec.
• Case 5: Network 1, Link LSAs, Retransmission Timer = 10 Sec, Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 2/8 Sec.
• Case 6: Network 2, Link LSAs, Retransmission Timer = 10 Sec, Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec.
5
Number of Non-Converged LSAs Vs. LSA Storm - Case 1, No Priority to Hello, Ack - LSA Storm Starts Between 20 and 30 Seconds
0
20
40
60
80
100
10 30 40 60 100
Time in Seconds
Nu
mb
er
of
No
nc
on
ve
rge
d
LS
Us
in N
etw
ork LSA Storm Size
= 100
LSA Storm Size= 140
LSA Storm Size= 160
6
LSA Storm Threshold for Sustained CPU Congestion
Maximum Allowable LSA Storm Size ForCaseNumber
No Priority toHello or Ack*
Priority to HelloOnly**
Priority to Helloand Ack**
Case 1 150 190 250
Case 2 185 215 285
Case 3 115 127 170
Case 4 320 375 580
Case 5 120 175 225
Case 6 185 224 285
* Congestion Due to Retransmissions and Adjacency Loss Due to Missed Hello** Congestion Due to Retransmissions only (Adjacency Stays Up)
7
Proposal
• Process Critical OSPF Packets (Hello, LSA Ack) at Higher Priority Compared to Other OSPF Packets– This May be Facilitated by Special Marking (e.g., use two
Diffserv Codepoints for OSPF Packets, one for Higher and other for Lower Priority Class)
• During Congestion use Any Packet Received over an Interface as a Surrogate for Hello in order to Keep Link Alive (Same Impact as Prioritized Hello)
• Other Potential OSPF Packets to Get High Priority– LSA Carrying Topology Change Information
– Database Description Packet from Slave That is Used as Ack
• These or Similar Mechanisms are Perhaps Already Being Used in Smart Proprietary Implementations– Proposal as BCP would Benefit All Implementations