1 explicit marking and prioritized treatment of specific ospf packets for faster convergence and...

1

Explicit Marking and Prioritized Treatment of Specific OSPF Packets for Faster Convergence and Improved

Network Scalability and Stability (draft-ietf-ospf-scalability-02.txt)

Gagan ChoudhuryAT&[email protected]

Vishwas ManralNetPlane Systems

[email protected]

Anurag MaunderSanera Systems

[email protected]

Vera Sapozhnikova AT&[email protected]

2

The Basic Issue

• In Large Operational Networks Running Link-State Protocols we have Often Observed Sustained CPU Congestion (Often Memory Congestion as well) Caused by LSA Storms Triggered By– Links/Nodes Failures

– Synchronization of Refreshes

– Software Bugs or Procedural Errors

• Congestion Reinforced by Positive Feedback Loop due to– LSA Retransmissions, possible packet droppings, possible link failures

due to missed Hellos and eventual recoveries More LSAs

• On Rare Occasions the Congestion Spreads to Many Nodes and Cause Significant Failures

• We Propose Prioritization of Hello, LSA Acknowledgment Packets to improve Network Stability and Scalability

• Prioritized Treatment may be facilitated by Special Marking

• “Smart” Proprietary Implementations are perhaps already doing it but we propose them as Best Current Practices so that all implementations benefit from it

3

Simulation Study

• Three Priority Scenarios– 1. Incoming LSUs, Hellos, LSA Acks at the Same Priority

– 2. Hellos have Priority over LSUs and LSA Acks

– 3. Hellos and LSA Acks have Priority over LSUs

• Network Scenarios: – Network 1: 100 Nodes, 1200 Links, Max Node Adjacency 50

– Network 2: 50 Nodes, 600 Links, Max Node Adjacency 48

• LSA Scenarios– 1 Router LSA per Node, 1 TE LSA per Link

– 1 Router LSA per Node, 10 ASE LSAs per Every Other Node

• LSA Retransmission Timer Value: 5 Seconds or 10 Seconds

• LSU Processing Time : ~ 1 ms, ~0.5 ms

• Hello/Router-Dead Interval: 10 Sec/40 Sec, 2 Sec/8 Sec

4

Six Simulation Cases

• Case 1: Network 1, Link LSAs, Retransmission Timer = 10 Sec, Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec.

• Case 2: Network 1, ASE LSAs, Retransmission Timer = 10 Sec, Proc. Time ~ 1 ms, Hello/Router-Dead-Interval = 10/40 Sec.


• Case 4: Network 1, Link LSAs, Retransmission Timer = 10 Sec, Proc. Time ~ 0.5 ms, Hello/Router-Dead-Interval = 10/40 Sec.



5

Number of Non-Converged LSAs Vs. LSA Storm - Case 1, No Priority to Hello, Ack - LSA Storm Starts Between 20 and 30 Seconds

0

20

40

60

80

100

10 30 40 60 100

Time in Seconds

Nu

mb

er

of

No

nc

on

ve

rge

d

LS

Us

in N

etw

ork LSA Storm Size

= 100

LSA Storm Size= 140

LSA Storm Size= 160

6

LSA Storm Threshold for Sustained CPU Congestion

Maximum Allowable LSA Storm Size ForCaseNumber

No Priority toHello or Ack*

Priority to HelloOnly**

Priority to Helloand Ack**

Case 1 150 190 250

Case 2 185 215 285

Case 3 115 127 170

Case 4 320 375 580

Case 5 120 175 225

Case 6 185 224 285

* Congestion Due to Retransmissions and Adjacency Loss Due to Missed Hello** Congestion Due to Retransmissions only (Adjacency Stays Up)

7

Proposal

• Process Critical OSPF Packets (Hello, LSA Ack) at Higher Priority Compared to Other OSPF Packets– This May be Facilitated by Special Marking (e.g., use two

Diffserv Codepoints for OSPF Packets, one for Higher and other for Lower Priority Class)

• During Congestion use Any Packet Received over an Interface as a Surrogate for Hello in order to Keep Link Alive (Same Impact as Prioritized Hello)

• Other Potential OSPF Packets to Get High Priority– LSA Carrying Topology Change Information

– Database Description Packet from Slave That is Used as Ack

• These or Similar Mechanisms are Perhaps Already Being Used in Smart Proprietary Implementations– Proposal as BCP would Benefit All Implementations

1 explicit marking and prioritized treatment of specific ospf packets for faster convergence and...

Documents