050111 - kuechler, schapranow, - congestion control

8/11/2019 050111 - Kuechler, Schapranow, - Congestion Control

1/31

Congestion Control

Alexander Kuchler, Matthieu-Patrick Schapranow

[email protected] , [email protected]

C ONGESTION describes a situation of extensive resource use when the supply ex-ceeds the capacity. This phenomenon was hoped to disappear by introducing morehigh-end data links, but it is still omnipresent in modern data networks. So, the trans-ported data volume increases rapidly and users suffer from bottle necks in routes takenby their data packets.Therefore, it becomes more and more important to minimize the risk of congestion andto nd ways to eliminate it if it occurs. Hence, this paper classies elementary typesof congestion and points out how to control it by implementing well-known algorithmseither on clients or servers side. On the one hand these include ways of preventingcongestion from occurring and, on the other hand, they offer fast mechanisms to cleanup congested network nodes without network-wide starvation.

A paper associated with the seminar

C OMMUNICATION N ETWORKS

Dr.-Ing. Thi-Thanh-Mai Hoang and Dr.-Ing. Andreas Willig

winter semester 2oo4/2oo5

Seminar Communication Networks 9-1
mailto:[email protected]:[email protected]:[email protected]://localhost/var/www/apps/conversion/tmp/scratch_7/[email protected]:[email protected]://localhost/var/www/apps/conversion/tmp/scratch_7/[email protected]


2/31

Congestion Control

9-2 Seminar Communication Networks


3/31

CONTENTS

Contents

1 Introduction 9-51.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-51.2 Possible Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5

1.2.1 Increase of Resources . . . . . . . . . . . . . . . . . . . . . . . . 9-51.2.2 Decrease of Load . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5

1.3 Congestion Control vs. Flow Control . . . . . . . . . . . . . . . . . . . . 9-51.4 Classication of Congestion Control Algorithms . . . . . . . . . . . . . . 9-6

2 Host Centric Algorithms 9-82.1 Open Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8

2.1.1 Open Loop and Source Driven . . . . . . . . . . . . . . . . . . . 9-82.1.1.1 Trafc Shaping . . . . . . . . . . . . . . . . . . . . . . . 9-82.1.1.2 Leaky Bucket . . . . . . . . . . . . . . . . . . . . . . . . 9-82.1.1.3 Token Bucket . . . . . . . . . . . . . . . . . . . . . . . . 9-9

2.1.2 Open Loop and Destination Driven . . . . . . . . . . . . . . . . . 9-102.2 Closed Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-112.2.1 Closed Loop and Implicit Feedback . . . . . . . . . . . . . . . . 9-11

2.2.1.1 Slow Start . . . . . . . . . . . . . . . . . . . . . . . . . 9-112.2.1.2 Congestion Avoidance . . . . . . . . . . . . . . . . . . 9-12

2.2.2 Closed Loop and Explicit Feedback . . . . . . . . . . . . . . . . 9-142.2.2.1 Choke Packets . . . . . . . . . . . . . . . . . . . . . . . 9-142.2.2.2 Fast Retransmit . . . . . . . . . . . . . . . . . . . . . . 9-162.2.2.3 Fast Recovery . . . . . . . . . . . . . . . . . . . . . . . 9-172.2.2.4 Fast Retransmit combined with Fast Recovery . . . . . 9-18

3 Router Centric Algorithms 9-203.1 Congestion Collapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-203.2 Small packet problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-203.3 Router Processed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-21

3.3.1 Weight Fair Queuing . . . . . . . . . . . . . . . . . . . . . . . . . 9-213.3.2 Load Shedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-223.3.3 Random Early Detection (RED) . . . . . . . . . . . . . . . . . . . 9-23

3.4 Router Indicated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-263.4.1 ICMP Source Quench . . . . . . . . . . . . . . . . . . . . . . . . 9-263.4.2 Explicit Congestion Notication (ECN) . . . . . . . . . . . . . . . 9-26

4 Conclusion 9-28

5 Glossary 9-29



4/31

Congestion Control

List of Figures

1 Classication of Congestion Control Algorithms . . . . . . . . . . . . . . 9-62 Leaky Bucket Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-93 Token Bucket Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-104 TCPs Slow Start / Congestion Avoidance combination . . . . . . . . . . 9-125 Time-cwnd-diagram: Slow Start / Congestion Avoidance. . . . . . . . . 9-136 Usual implementation of Choke Packets . . . . . . . . . . . . . . . . . . 9-147 Choke Packets hop-by-hop-scenario . . . . . . . . . . . . . . . . . . . . 9-158 Time-cwnd-diagram: Fast Retransmit . . . . . . . . . . . . . . . . . . . 9-169 Time-cwnd-diagram: Fast Recovery . . . . . . . . . . . . . . . . . . . . 9-1710 Time-cwnd-diagram: Fast Retransmit combined with Fast Recovery . . 9-1911 Packet Flow: Fast Retransmit combined with Fast Recovery . . . . . . . 9-1912 Circuit diagram: Weight Fair Queuing . . . . . . . . . . . . . . . . . . . . 9-2113 Load Shedding Algorithm: Dropping packet seven of twelve . . . . . . . 9-2214 Load Shedding Algorithm: Dropping packet ten of twelve . . . . . . . . . 9-23

15 Petrinet: Random Early Detection . . . . . . . . . . . . . . . . . . . . . 9-24



5/31

1 INTRODUCTION

1 Introduction

1.1 Denition

A situation is called congestion if performance degrades in a subnet because of toomany data packets in present, i.e. trafc load temporarily exceeds the offered re-sources.

The number of packets delivered is proportional to the number of packets send.But if trafc increases too much, routers are no longer able to handle all the trafc andpackets will get lost. With further growing trafc this subnet will collapse and no morepackets are delivered.

Obviously, two naive solutions are possible: increase of resources or decrease ofload.

1.2 Possible Solutions

1.2.1 Increase of Resources

Increase of resources can be reached by stocking up routers memory to build a queuefor all input lines in order to make one output line. Furthermore, embedding a fasterprocessor which is able to do background tasks at least as fast as it could be able tomake output is another possibility. Last but not least higher bandwidth could be a factorfor avoiding congestion, too.

But in case of upgrading some components the bottleneck is only shifted. That iswhy all components need to be balanced. Because of technical limits it is not possibleto increase the resources innitely (and even if it would be possible it will not avoidcongestion), so it is necessary to decrease the load.

1.2.2 Decrease of Load

A real decrease of load in a subnet is only possible by encouraging hosts to reducetheir outgoing trafc. This idea is not practicable so it is necessary to decrease theload at chosen single points. To decrease routers load it is useful to tell other routersforwarding packets another way without using the heavy loaded router.

1.3 Congestion Control vs. Flow Control

Ensuring that all trafc is carried in a subnet is called C ONGESTION C ONTROL andcontrolling the point-to-point trafc between a sender and a receiver is called F LOWC ONTROL .



6/31

Congestion Control

Congestion Control involves all hosts, routers, store-and-forwarding processes andother factors that have something in common with the subnets capacity. Flow controlshould slow down a sender which is trying to send more that the receiver can dealwith. Some Congestion Control Algorithms also implements some kind of slow downmessages, so Flow Control and Congestion Control are unfortunately admixed.

1.4 Classication of Congestion Control Algorithms

It is practical to divide Congestion Control Algorithms in two main classes describingthe place where they inuence networks behavior as described in [Zha86] and shownin 1. This is either on hosts side establishing end-to-end Congestion Control calledHOST C ENTRIC or on routers side affecting transfered data packets called R OUTERC ENTRIC .

Figure 1: Classication of Congestion Control Algorithms

The Host Centric class is characterized by a high level of abstraction in the networkmodel, so that intermediate network nodes are considered as a static and transparentconnection channel without any inuence on networks behavior.

However, algorithms according to the Router Centric class involve each network ac-tor as an active part of the Congestion Control process, thus more dynamic control canbe guaranteed by implementing algorithms up to the Network Layer of the ISO-/OSI-Model.

Four classes of Host Centric Congestion Control Algorithms are discussed by Yangand Reddy in [YR95].

On the one hand, there is the simple static solution called O PEN LOOP to preventcongestion by understating possible bandwidth on senders side. That means, anynetwork client uses only a part of the available network bandwidth instead of bursting



7/31

1 INTRODUCTION

the whole data as fast as possible through the network. Certainly, this reduces thethroughput, but it is a simple way to prevent congestion from developing. Algorithms ofthis type can be implemented on either sources or destinations side, so this class issubdivided in source driven and destination driven approaches.

On the other hand, there is the more dynamic method to prevent congestion calledC LOSED LOOP . These algorithms adjust system preferences depending on the individ-ual network state by gathering facts about already detected or possible soon appearingcongestion situations. These status information can be collected by explicit receivermessages as well as implicit polling initiated by the sender to check whether a route iscongested or not.

Additionally, Router Centric Congestion Control Algorithms can be divided in twosubclasses. These classes specify the reaction in case of congestion: Router ProcessedAlgorithms do active congestion handling such as packet dropping whereas Router In-dicated Algorithms signal congestion state and do not inuence this condition directly.



8/31

Congestion Control

2 Host Centric Algorithms

2.1 Open Loop

In the rst place, Open Loop Algorithms try to avoid congestion without making any cor-rections once the system is up. Essential points for Open Loop solutions are e.g. decid-ing when to accept new trafc, when to discard which packets and making schedulingdecisions. All of these decisions are based on a sensible system design, so they donot depend on the current network state.

A sender has to determinate how many packets can be send without provokingcongestion. The receiver has to decide carefully which packets to discard becausedropping any packet can cause considerable data retransmission and this will result inadditional network load.

2.1.1 Open Loop and Source Driven

2.1.1.1 Trafc Shaping

TRAFFIC S HAPING is a generic term for a couple of algorithms avoiding congestion onsenders side without feedback messages. Therefore, an essential decision - the datarate - is negotiated either on connection set-up or is statically included in used imple-mentations.

Afterwards, this negotiated data rate will be hold and variations are negligible. Thismethod can be found especially in ATM telecommunication networks such as LeakyBucket ( 2.1.1.2 ) or Token Bucket implementation (2.1.1.3 ). But this potentially creates

latency which is problematic for some applications, such as real time audio and videoapplications.

2.1.1.2 Leaky Bucket

The L EAKY B UCKET Algorithm generates a constant output ow. The name describesto way of working: it works like a bucket with water and a leak on the bottom as shownin gure 2.

How much water runs into the bucket does not matter. As long as there is any waterleft in the bucket it runs out at the same constant rate dened by the leaks size. Ob-viously, if there is no water in the bucket there is no output. If the bucket is completelylled additional incoming water gets lost.

This metaphor reects typical network behavior where drops of water are data pack-ets and the bucket is a nite internal queue sending one packet per clock tick.



9/31

2 HOST CENTRIC ALGORITHMS

Figure 2: Leaky Bucket Algorithm

2.1.1.3 Token Bucket

The T OKEN B UCKET Algorithm is a variation of the aforementioned L EAKY B UCKET Al-gorithm ( 2.1.1.2 ).The intention is to allow temporary high output bursts, if the origin normally does notgenerate huge trafc. One possible implementation uses credit points or tokens whichare provided in a xed time interval. These credit points can be accumulated in a lim-ited number (= bucket size) in the bucket. In case of submitting data these credits haveto be used from the bucket, i.e. one credit is consumed per data entity (e.g. one byteor one frame) that is injected into the network. If the amount of credit points is used up(the bucket is empty), the sender has to wait, until it gathers new tokens within the nexttime interval.

This fact is illustrated in gure 3 by trying to inject ve data entities into the network(a) with three available credit points. After transmitting three of ve data entities in thistime tick, no more credits are available, thus no more data entities are injected into thenetwork (b) until new credits are accumulated with the next time tick.

This algorithm provides a relative priority system. On the one hand, it allows send-ing small data-bursts immediately, which do typically not congest networks. On theother hand, this algorithm will not drop any packets on senders side such as L EAKYBUCKET (2.1.1.2 ). Because if no further tokens are available in the bucket, any sendingattempt is blocked until a new token becomes available.



10/31

Congestion Control

Figure 3: Token Bucket Algorithm

2.1.2 Open Loop and Destination Driven

Algorithms accompanying with this group can be identied by their static behavior:once these implementations are running they work regardless how networks statechanges. That means, congestion is avoided by receivers side because of well for-mulated specication. The question is, how those algorithms may look like. They usereceivers capabilities to inuence the initial senders behavior without any explicit indi-cation.

Therefore, one possible implementation could send a smaller advertised window(awnd) size in TCP headers than really possible to throttle down senders output. An-other idea could be the delaying of ACK packets sending by a xed time, which mustbe clear below the senders timeout including needed network latency time. But it isreally difcult to determine this delay time statically in more or less dynamic networktopologies such as the Internet. Furthermore, the receivers inuence on the sender isonly an advice, which could be ignored. Therefore, algorithms belonging to this groupare no longer important for development and research, thus no explicit examples will

be given in this paper.



11/31


2.2 Closed Loop

Closed loop solutions are the network implementation of typical control circuit. Algo-rithms according to this class depend on a feedback loop with three parts:

1. system-wide congestion monitoring,

2. pass this information to an action point, and

3. adjust system operations to deal with the congestion.

To detect congestion it is useful to monitor network values like percentage of discardedpackets because of memory lacks, the number of timed out and therefore retransmittedpackets and average queue lengths as well as packet delay such as round trip times.

The gathered information have to be send from the nearly congested point to the

responsible party. So, it is necessary to send this information and with these messagesthe trafc increases more and more, which encourages congestion to occur.

The main goal of closed loop solutions is slowing down routers sending packets bycollecting packets in their own queues to reduce and even break down congestion.

2.2.1 Closed Loop and Implicit Feedback

2.2.1.1 Slow Start

The Slow Start Algorithm as described in [Ste97, section 2], [APS99, section 3.1] triesto avoid congestion by sending data packets defensively. Therefore, two special vari-ables named congestion window (cwnd) and Slow Start threshold (ssthresh) are storedon senders side.

Initially, cwnd is sized to one packet when the sender injects a new packet into thenetwork and waits for the acknowledgment (ACK) from the receiver. Normally, thispacket gets through the network and reaches the recipient in time, so it will be repliedby an ACK.

If this acknowledgment is received by the sender, cwnd is incremented; if networkcapacity is reached and packets get lost, the sender does not increment the number ofpackets any further. That means, by each sending cycle the number of injected datapackets is doubled until networks capacity is reached and the required ACK cannot getthrough. More accurate in TCP, the minimum of cwnd and TCPs advertised windowsize species the number of data packets to be injected. If the required ACK packets donot reach the sender within a specied timeout, the sender interprets it as an evidencefor congestion. Therefore, the sender will set cwnd to its initial value and restarts datatransmission as aforementioned.



12/31

Congestion Control

2.2.1.2 Congestion Avoidance

This algorithm dened by [Ste97, section 2], [APS99, section 3.2] is used in combina-tion with Slow Start, the exclusive use of Slow Start produces uctuated data rates andloads the network additionally. Figure 4 describes that Slow Starts threshold (ssthresh)

Figure 4: TCPs Slow Start / Congestion Avoidance combination

is set to 12 cwnd max , if the cwnd size exceeds network capabilities, and congestion win-dow is set back to its initial value [Jac88]. Afterwards, C ONGESTION AVOIDANCE startsto work: Slow Start increases packets sending rate exponentially until the optimal SlowStart threshold (ssthresh) value is reached. By reaching this threshold level, the sizeof the congestion window is calculated linearly, i.e. the threshold rate increases asslowly as necessary and as fast as possible until the maximum network capabilities arereached, that means no further ACKs can get through the network to the sender.

Figure 5 reects typical connection set-up process generated by algorithm pairSlow Start / Congestion Avoidance in TCP. The graph is three-divided in segmentsfrom time tick zero to four, from four to eight and from eight to the end. Startingin segment one, initial cwnd size is one and the value doubles each clock tick untiltimeouts occur caused by congestion at clock tick number four, where cwnd reaches



13/31


the size of sixteen packets. Hence, the optimal Slow Start threshold is calculated asssthresh = 12 cwnd = 8 and the actual cwnd-size is set back to its initial value one.During segment two, Slow Start begins to work again until cwnd reaches its optimalwindow size dened by eight at time tick eight. Therefore, algorithms change and Con-gestion Avoidance continues the work by increasing the cwnd each tick linear by one(or any other specied value) in segment number three. By reaching the maximal pos-sible cwnd, here sixteen, Congestion Avoidance stops increasing cwnd and transmitsstable at this rate until further timeouts occur and the described algorithm pair startsagain.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

2

4

6

8

10

12

14

16

time

cwnd

ssthresh

Slow Start Congestion Avoidance

Figure 5: Time-cwnd-diagram: Slow Start / Congestion Avoidance.



14/31

Congestion Control

2.2.2 Closed Loop and Explicit Feedback

2.2.2.1 Choke Packets

Figure 6: Usual implementation of Choke Packets

The Choke Packets approach interprets the whole network as an active part of FlowControl. Therefore, each network actor has its own maximum of throughput rate andif it is exceeded so-called C HOKE PACKETS are send to the origin. These speciallymarked packets prevent further network nodes from generating equal Choke Packetsand thus they prevent duplicate feedback. Furthermore, after a Choke Packet reachesthe initial sender, it will throttle down its own output rate to an adequate level, within thetime congestion is reported by further Choke Packets.

The best known Choke Packet is the Source Quench Internet Connection Manage-ment Protocol (ICMP) message as described in 3.4.1 generated by busy routers.

Moreover, expansive networks such as the Internet or large LANs nowadays containnumerous nodes between sender and recipient, so that the latency is proportional tothe number of nodes among them. It is not exceptional to have twelve or sixteen nodesin-between sender and recipient on a network and if the last node starts to suffer fromcongestion feedback information in form of Choke Packets travels n 1 hops back tothe source as illustrated in gure 6. Rectangles are network nodes, the dashed one onthe right side is congested, and circles in-between are communication channels, the



15/31


dark colored rectangles contain high data rate and the light ones contain choked datarate, lled circles contain Choke Packets.

Meanwhile, the sender injects data into the network unaffected until the ChokePacket arrives and the output can be choked dramatically. But the already sent packetstravel to the congested node, so that 2n 1 packets arrive and contribute to congestionat the loaded node, with n indicating the position of the congested router.

A possible solution to minimize this latency is to allow each network node bufferingdata arrived from the sender when a Choke Packet already passed as illustrated ingure 7. This affects data ow directly after each node when a Choke Packet passes,so the throughput is decreased immediately and the place of congestions moves hop- by-hop to the sender and nally disappears there completely. At this point, a typical

Figure 7: Choke Packets hop-by-hop-scenario

congestion scenario with standard Internet routers in homogeneous networks is de-scribed. A small LAN containing three clients is connected through a Network AddressTranslator (NAT) router to the Internet Service Provider (ISP). Two of the three clientscall sporadically some websites but the third one transfers permanent high amounts ofdata. If any of the harmless clients try to transfer some data it will get Choke Packetsresponded. This is caused by the third bursting client and is obviously unfair becausethe affected clients are not responsible for that congestion.

Therefore, it is possible to use the specialized version of this algorithm characterizedby one queue per client, so that only the third client would receive Choke Packets andthe two others would be able to transfer small amounts of data unaffected. Thosetypes of Router Centric Algorithms are summarized as A CTIVE Q UEUE MANAGEMENTand are described in Weight Fair Queuing (cf. gure 3.3.1 ) partially.



16/31

Congestion Control

2.2.2.2 Fast Retransmit

FAST R ETRANSMIT Algorithm uses explicit feedback methods to avoid long timeout pe-riods waiting for packet retransmitting in case of packet loss.

Such problems are inherent in packet-switched data networks because every datapacket can travel individually trough the rest of the network and can use special routesfrom the sender to the recipient. Consequently, the transmitted data packets will nei-ther reach the recipient in accurate order nor complete continually.

Therefore, after detecting a missing packet the recipient sends duplicated ACKpackets for the last correct received packet until the missing packet receives. Unfortu-nately, TCP may use duplicate ACK packets to indicate out-of-order-packets, thus twoACK packets do not necessarily indicate a lost packet. Therefore, if a sender receivesmultiple ACK packets with the same sequence number, normally at least three of them,these packets indicate the last successfully submitted packet. Furthermore, the pres-ence of these ACK packets underlines the absence of congestion, otherwise thesepackets could not have been received, too. Thus, the sender restarts the transmissionwith the packet specied by the multiple ACK packets. This results in fast retransmission of outstanding data without waiting for timers to get expired.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

2

4

6

8

10

12

14

16

time

cwnd

ssthresh

Slow Start Congestion Avoidance

static timoutdupl. ACKs

Slow Start

Figure 8: Time-cwnd-diagram: Fast Retransmit

Figure 8 illustrates that Fast Retransmit does not change data output characteris-tics. After starting with Slow Start at time tick zero one packet is not acknowledged. Attime tick ve, six and seven the corresponding red marked ACK duplicates receive atthe sender. The third duplicate ACK triggers Fast Retransmit to work, so a half Slow



17/31


Start cycle continuing with Congestion Avoidance is executed. This behavior is identi-cal to Slow Start / Congestion Avoidance as described in 2.2.1.2 . Therefore, it is notnecessary to wait for a static timeout as dened at time tick twelve.

Nevertheless, due to networks latency this approach is problematic within increas-ing network diameter and numerous intermediate network nodes. This problem is dis-cussed later in context of Weight Fair Queuing (cf. 3.3.1 ).

2.2.2.3 Fast Recovery

A special Congestion Avoidance Algorithm often combined with Fast Retransmit (cf.2.2.2.2 ) to restart transmission at a higher throughput rate than Slow Start (cf. 2.2.1.1 )does it is the F AST R ECOVERY Algorithm.

Fast Recovery starts when Fast Retransmit stops working. If no further duplicateACK packets are received for Fast Retransmit Algorithm, the sender tries to return

to normal sending state. But, instead of Slow Start Congestion Avoidance (additive-increase) is used, because the returned duplicate ACK packets traveled successfullythrough the network. So, no congestion is present on this route at the present time andthe sender can begin transmitting at a relatively high output rate specied by ssthresh.

0 1 2 3 4 5 6 7 8 9 10 11 12 130

2

4

6

8

10

12

14

16

time

cwnd Congestion Avoidance

ssthresh

Slow Start

Figure 9: Time-cwnd-diagram: Fast Recovery

Figure 9 points out, how Fast Recovery speeds up transmission in contrast to clas-sical Slow Start Algorithm (cf. 2.2.1.1 ). After the rst missing ACK packet at time tickfour an additional half Slow Start cycle is skipped and instead Congestion Avoidanceis started with output rate eight specied by optimal Slow Start threshold.



18/31

Congestion Control

2.2.2.4 Fast Retransmit combined with Fast Recovery

As aforementioned, Fast Retransmit and Fast Recovery are a so-called algorithm pair,because they are used rarely alone. As discussed in [Ste97, section 4] and [APS99,section 3.2] with the arrive of the third consecutive duplicate ACK packet cwnd andssthresh on senders side are set to:

ssthresh = maxcwnd

2 , 2 , cwnd = ssthresh + 3 .

Fast Recovery is triggered by at least three duplicate ACK packets; this implies thesuccessful receipt of at least three packets after the missing one. Each further dupli-cate ACK packet arriving at the sender results in incrementing cwnd by one, becauseanother packet left the network and is cached in the receivers input buffer. The re-ceiver may dene an advertised window (awnd) size, which indicates the number ofadditional cachable packets on receivers side; this is a destination driven indicator. Ifthe cwnd size is below the awnd size, i.e. cwnd = min (cwnd, awnd ), the sender isable to send at least one more data packet to the receiver, because it is able to processthem. Otherwise, although the sender is able to send further packets, it is not advisableto send them immediately, because the receiver would discard them in result of a lackof resources.

When the next non-duplicate ACK packet reaches, it should be the one for the re-transmitted packet, the sender sets the cwnd again to ssthresh. Furthermore, this ACKpacket acknowledges the outstanding packets already sent after the lost one and beforethe three identical duplicate ACK packets reached the sender. At this point, Conges-tion Avoidance starts working and increments the output rate linearly as described in2.2.1.2.

Figure 10 and gure 11 illustrate this algorithm pair. After starting with a typicalSlow Start at time tick zero at least one data packet, here packet ve, is not acknowl-edged and duplicate ACK packets arrive at time tick three, four and ve. With the thirdduplicate ACK packet a variation of Congestion Avoidance is triggered at time tick six.After calculating ssthresh = max 42 , 2 = 2 and cwnd is set to cwnd = ssthresh + 3 = 5and linear increasing is started. On the one hand, it is not necessary to wait for timersto timeout as dened at time tick ten. On the other hand, the Congestion Avoidance attime tick six sets the data output rate to a relatively high level, here the best possible.



19/31


0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

time

cwnd

ssthresh

Congestion Avoidance

static timeoutdupl. ACKs

Slow Start

Figure 10: Time-cwnd-diagram: Fast Retransmit combined with Fast Recovery

Figure 11: Packet Flow: Fast Retransmit combined with Fast Recovery



20/31

Congestion Control

3 Router Centric Algorithms

Up to this point the described algorithms involve networks intermediate nodes suchas routers, switches, and hubs as active actors rarely. Consequently, research anddevelopment decided to implement more dynamic scenarios avoiding progressive net-

work congestion. This concept called R OUTER C ENTRIC , reduces the normally neededcommunication between the point of congestion and the origin as well as the risk fortimeouts and high latency times until reaction can be initiated as described in [Zha86].

3.1 Congestion Collapse

The phenomenon of Congestion Collapse occurring in datagram networks using telnetapplication was rst dened by John Nagle in 1984 [Nag84]. It only exists in datagramnetworks with retransmission policy such as Transmission Control Protocol (TCP) on

Internet Protocol (IP) basis. Therefore, it is inherent in the systems idea, because withthe occurrence of a bandwidth bottleneck, the number of traveling packets increases.This is the intended behavior because of longer round trip times (RTTs) and does notindicate any problem. If an expected ACK packet from the receiver do not arrive in time,retransmission is started by the sender automatically and in case of adaptive host re-transmission algorithm the RTT average threshold is increased. In fact, this behavioris intended, but in result of a sharp RTT rising, even adaptive host retransmission isnot able to overcome this scenario. More and more copies of the same packet are in-

jected into the network on senders side and contribute to serious network congestion.A possible solution as discussed in 3.4.1 is the ICMP Source Quench packet indicatedby routers and gateways, so that senders can decrease their output level individually.

3.2 Small packet problem

TCP encapsulates network data in IP packets for further network transmission, there-fore especially small data quantities get large overhead by the associated TCP header.For instances, transmitting a single character will result in a 41 bytes long data packet,containing only one byte essential data and 40 bytes additional overhead. Transmit-ting multiple small data packets with huge headers stresses congested networks ad-ditionally. To avoid these packets a static delay mechanism is implemented, delayingpackets some hundred milliseconds. This helps avoiding congestion to occur, but donot decrease network load in congested state. Therefore, a dynamic approach is givenin [Nag84, page 3] which uses buffers to accumulate sending data until outstandingACK packets receive (after idle connection the rst packet is sent without waiting forany ACK packet). This results in potential larger data packets reducing the overalloverhead and stops ooding already congested networks, because no ACK packetswill go through.



21/31

3 ROUTER CENTRIC ALGORITHMS

Assuming a le transfer over a network part with ve seconds RTT, a windows sizeof 2 kB and a application writing data in blocks of 512 Bytes to TCP. The rst packetcontaining 512 Bytes data and additional 40 Bytes header will be sent to the receiver.Within the relatively high RTT, TCP buffers incoming data from the application, thusafter ve seconds and one arriving ACK packet the next packet can be sent. Startingwith this packet the data amount is constant 2 kB, so although the second packet starts

just after ve seconds delay (and all further, too), it uses the maximal available amountof data and the header-data-ratio decreases.

3.3 Router Processed

3.3.1 Weight Fair Queuing

Obviously, a fundamental problem on many intermediate network nodes is the fairnessof Congestion Control. This involves the question, which packets from which senderare dropped from the input buffer in case of occurring congestion. The D ROP TAIL Al-gorithm, which decides to cut the last i received packets off the queue when the queuelength exceeds a xed length and the L OAD S HEDDING Algorithm as described in 3.3.2are simple, rarely fair algorithms.

Therefore, the continuation of these ideas must be a fair queuing algorithm to avoidsenseless packet dropping, because every sender loads the network individually andthus every sender contributes to congestion by its data quantity.

Figure 12: Circuit diagram: Weight Fair Queuing

A possible solution is an extension of the simple queuing algorithm: the W EIGHTFAIR Q UEUING . This algorithm introduces round-robin-scheduling to the queue, thatmeans every sender got its own associated queue and so bursting senders can betreated in a special way as described in gure 12 . So, slow or periodically transmit-ting senders are not affected by Congestion Control and packets from explicit ooding



22/31

Congestion Control

senders will be dropped.

Nevertheless, this way of fair queuing does not satisfy a scenario with a non-burstingsender transmitting multiple small quantities of data packets to multiple different recip-ients in a network. This includes either congested or non-congested network parts,which can carry more data than the congested part. Thus, a rather fair way would be asolution with a per source-destination-pair queue but this would exceed contemporaryhardware limits.

3.3.2 Load Shedding

LOAD S HEDDING as discussed in [Tan03] and [YR95] is one way to deal with congestionthat does not disappear by itself, so not handled packets are discarded by the router.To implement a router the intelligence to decide which packet should be dropped appli-cations must mark their packets in priority classes. This indication should be done in a

very sensitive way, because it does not make any sense to mark all packets with a highpriority.

Two examples will make it clear why the classication in priority classes is useful.The rst example describes a discarding and resending problem. If a router dropspacket seven of twelve, the sender will send packet seven to twelve again as illustrated

Figure 13: Load Shedding Algorithm: Dropping packet seven of twelve

in gure 13. So, discarding one packet provokes retransmission of these six pack-ets. Alternatively, the router can discard packet ten so that only three packets have tobe resend (gure 14 ). The second example is about multimedia compression: videocompressing algorithms divide videos in sequences of entire and subsequent frames.After transmitting an entire frame only subsequent frames with differences from thelast entire frame are transmitted. So, if an entire frame is discarded, every subsequentframe gets useless. Otherwise, discarding a subsequent frame will show the entire but



23/31


Figure 14: Load Shedding Algorithm: Dropping packet ten of twelve

old frame. Obviously, it is important to choose the right packet, so that the number ofretransmitted packets is minimized; that is the Load Sheddings task.

3.3.3 Random Early Detection (RED)

R ANDOM E ARLY D ETECTION belongs to the class of Active Queue Management Algo-rithms and is implemented router centric. This approach tries to prevent completely fullrouter queues by either marking or discarding packets before the situation gets hope-less and congestion occurs. So, for instance in combination with Explicit CongestionNotication as described in 3.4.2 it is not primarily essential to drop packets; instead

they are marked with a congestion ag and senders throttle down their output individ-ually. The well-known implementation as illustrated in gure 15 can be subdivided intotwo main parts as described in [FJ93]: A VERAGE Q UEUE L ENGTH C ALCULATION andPACKET D ROP D ECISION .

Each incoming packet triggers a new calculation of the average queue length(avgQueueLen) and afterwards the packet drop decision is made on basis of two xedthresholds, a lower and an upper one, thus the following three possible cases aredistinguishable:

average queue length is below lower threshold, i.e. no congestion present,

average queue length is above upper threshold, that means packets have to bediscarded/marked as a result of serious congestion state.

average queue length is above lower threshold and below upper threshold, thatmeans congestion is possible raising and marking/discarding probability has tobe calculated.



24/31

Congestion Control

Figure 15: Petrinet: Random Early Detection

Average Queue Length Calculation uses this function:

avgQueueLen(queueLen) = (1 queueWeight)avgQueueLen + queueWeightqueueLen ,

with the current queue length queueLen and an individual queue weight parameterqueueWeight (approximately between 0.002 and 0.003), which balances the reactionon temporary incoming bursts.

Packet Drop Decision is calculated when the average queue length avgQueue-Len is between lower and upper threshold with the following function pair:

pmark (avgQueueLen) = avgQueueLen threshold lowerthreshold upper threshold lower

and

pnal(count) = pimpact pmark

1 pmark count,

with count is the number successfully processed packets since last marked packetand an individual impact parameter pimpact to specify the algorithms reaction strength(approximately 0.1). This results in pnal [0, p impact ] and pnal increases with in-creasing avgQueueLen.



25/31


With help of the ascertained nal probability some packets get marked or discarded.When using an additional algorithm such as Explicit Congestion Notication as de-scribed in 3.4.2 a specic header bit is set and the sender throttles down its outputrate. By dropping packets the same result is forced, because missing ACK packetsresult in retransmission of discarded packets at half-rate such as described in contextwith Congestion Avoidance (2.2.1.2 ).

The idea of random discarding intends to desynchronized network clients, usingthis intermediate node, so that data bursts do no longer arrive simultaneously. So, theintermediate node is able to handle the incoming trafc by shifting its magnitude todifferent time ticks.



26/31

Congestion Control

3.4 Router Indicated

3.4.1 ICMP Source Quench

The TCP/IP suite contains special Internet Control Message Protocol (ICMP) mes-sages for indicating special network states such as congestion. On the one hand,

receivers should send these messages when packets have been dropped and on theother hand, intermediate network nodes should send these indications shortly beforethey have to drop any packets; Nagle (cf. 3.1 ) decided to send these messages whenrouters buffer get half lled.

If a host receives a Source Quench message, it should react with a decreasingoutput data rate, so that the number of pending packets is reduced. That results incommunication on a more moderate level, but does not end in starvation. But this isonly a recommendation and so not every host must interpret these messages. There-fore, it is important to defense intermediate nodes such as routers and gateways fromexcessive trafc generated by malicious hosts.

The only way intermediate nodes can react to this overload situation is to discardpackets. But how to select these packets fairly? On the one hand, a simple approach isto drop last recently arrived packets, thus the network load is decreased, but each hostusing this gateway is inuenced and suffers from the malicious hosts behavior. On theother hand, newly arriving packets are analyzed whether duplicates of this packet arealready in the processing queue. This can be done with the help of hash functions,thus a decision can be reached as fast as possible with a minimum of computationalresources. This tactic reduces network load generated by hosts using bad static re-transmission techniques.

Many other solutions are conceivable and so these algorithms are part of nowadaysresearch and development.

3.4.2 Explicit Congestion Notication (ECN)

This algorithm is designed for homogeneous networks based on TCP and involves allintermediate routers as dynamic parts of congestion controlling. Therefore, each routerin this scenario has to support ECN to reduce the number of potential dropped packetas a result of overowing router buffers.

ECN uses two specic header bits of the IP protocol called ECN-Capable Transportand Congestion Experienced (CE) as dened in [RFB01, section 5]. To indicate ECN-support one bit is set to one and the corresponding bit is set to zero such as 10 or 01because missing ECN-support is indicated by setting none of these bits.

ECN-capable routers must notice raising congestion and modify transfered packetsby setting each of these bits to one, so that initial packets sender can decrease outputrate immediately. That implies a similar problem as mentioned in 2.2.2.1 . If there are



27/31


numerous hops between initial packets sender and the point of occurring congestion,the systems-wide latency avoids immediate reaction, thus already injected data cancontribute to networks critical load.

For instance, sending a data packet from host A to host B with fteen routers in-between, where the last router remarks occurring congestion. The sent data packetspasses router fteen, both ECN bits are set to one and travels to host B, which respondswith an ACK, passing router fteen, setting both ECN bits to one again and travels tohost A. So, within the time the rst marked packet receives at the initial sender addi-tional 2(n + 1) superuous packets already have been injected, with n represents thenumber of routers in-between.

In case of few routers, the result is a immediate lowered transmission rate onsenders side and no more congestion is produced by this host. This results in noor minimal data retransmission because of no discarded data packets on routers side,so that superuous network load is prevented. The given implementation is question-

able because the notication could get involved in congestion or get lost anyway, sothe indication will never reach the sender and no data can be choked.

Nowadays, best known ECN-implementation is the Source Quench ICMP ChokePacket 3.4.1 generated by routers. But in fact, this is only rarely supported and highlycriticized because these messages consume bandwidth and could increase conges-tion as well.

A similar solution is the so-called DECBit used in association with the Random EarlyDetection Algorithm (cf. 3.3.3 ) with the DECnet protocol. It indicates the exceeding ofa specic router queue length level. If such modied ACK packets reach the initial datapackets sender, it has to throttle down its output rate to e.g. the half prior output rate.



28/31

Congestion Control

4 Conclusion

The chosen classication is only one way to classify the algorithms; other ones startwith some policies on the different layers of the ISO/OSI-model for explaining the workof the algorithms. With a much more detailed look on the implementation of the givenalgorithms it is possible to recognize that the given classication is very close to a clas-sication based on different layers of the ISO/OSI-model.

The specied algorithms underline the fact, that all Host Centric Algorithms work onthe Transport Layer or above. Most of the implementations use special (TCP) headerelds to transmit additional information. However, most Router Centric Algorithms dowork on the Network Layer, which is obviously the result of the TCP/IP stack.

The variety of available Congestion Control Algorithms can not x the problem com-pletely and congestion will never be solved by one algorithm alone. Therefore, it isnecessary to know congestion reasons in a specic network or subnetwork to chose

well working algorithms or combinations of algorithms to reduce congestion in thesespecic network circumstances.

In further growing wide area networks (WANs) such as the Internet congestion cannot be exiled, because of growing complexity of these virtual heterogeneous networks.Large data networks suffer under the mathematical problem that network burst andtrafc rates can not be determinate by calculating, so the best approximation can onlybe done stochastically.

This leads to the philosophical question, whether those WANs have reached a levelof self-sufciency, so they are not controlled by individuals any longer, rather than bythemselves because all components contribute a to more and more intelligent way ofrecovering from temporarily overload states such as congestion.



29/31

5 GLOSSARY

5 Glossary

ACK Acknowledgment Packets such as used in TCP.

ATM Asynchronous Transfer Mode using xed cell-data length of 47 bytes.

awnd Advertised Window: indicates max. available buffer on receivers side.cwnd Congestion Window: special buffer used in TCPs Slow Start.

ECN Explicit Congestion Notication: A Router Centric Congestion Control Algorithm.

ICMP Internet Connection Management Protocol used for special network control mes-sages.

IP Internet Protocol: unreliable datagram service offered by networks using the mostcommon TCP/IP protocol suite.

ISP Internet Service Provider offers access to the Internet.LAN Local Area Network.

NAT Network Address Translation: Used by routers to switch data packets betweendifferent subnets.

RED Random Early Detection: A Router Centric Congestion Control Algorithm.

RTT Round Trip Time describes packets traveling time through a network.

ssthresh denes the Slow Start optimal threshold.

TCP Transmission Control Protocol used in the TCP/IP protocol suite to offer reliabledatagram service on unreliable IP basis.

WAN Wide Area Network.



30/31

Congestion Control

References

[APS99] Mark Allman, Vern Paxson, and W. Richard Stevensen. Request ForComments 2581: TCP Congestion Control. Technical report, April 1999.http://www.ietf.org/rfc/rfc2581.txt .

[FJ93] S. Floyd and V. Jacobsen. Random Early Detection gateways for CongestionAvoidance. IEEE/ACM Transactions on Networking, V.1 N.4 , pages397413, August 1993.http://www.icir.org/floyd/papers/red/red.html .

[Jac88] V. Jacobsen. Congestion Avoidance and Control. August 1988.

[Nag84] John Nagle. Request For Comments 896: Congestion Control in IP/TCPInternetworks. Technical report, January 1984.http://www.ietf.org/rfc/rfc896.txt .

[RFB01] K. Ramakrishnan, S. Floyd, and D. Black. Request For Comments 3168:The Addition of Explicit Congestion Notication (ECN) to IP. Technicalreport, September 2001. http://www.ietf.org/rfc/rfc3168.txt .

[Ste97] W. Richard Stevens. Request For Comments 2001: TCP Slow Start,Congestion Avoidance, Fast Retransmit and Fast Recovery Algorithms.Technical report, January 1997. http://www.ietf.org/rfc/rfc2001.txt .

[Tan03] Andrew S. Tanenbaum. Computer Networks - 4th Edition . Prentice HallPTR, 2003.

[YR95] C.-Q. Yang and A.V.S. Reddy. A Taxonomy for Congestion ControlAlgorithms in Packet Switching Networks. IEEE Network Magazine, Vol. 9 ,pages 3445, July/August 1995.

[Zha86] Lexia Zhang. Why TCP Timers Dont Work Well, CommunicationsArchitectures and Protocols. pages 397405, August 1986.

http://www.ietf.org/rfc/rfc2581.txthttp://www.icir.org/floyd/papers/red/red.htmlhttp://www.icir.org/floyd/papers/red/red.htmlhttp://www.ietf.org/rfc/rfc896.txthttp://www.ietf.org/rfc/rfc3168.txthttp://www.ietf.org/rfc/rfc2001.txthttp://www.ietf.org/rfc/rfc2001.txthttp://www.ietf.org/rfc/rfc3168.txthttp://www.ietf.org/rfc/rfc896.txthttp://www.icir.org/floyd/papers/red/red.htmlhttp://www.ietf.org/rfc/rfc2581.txt


31/31

REFERENCES

050111 - kuechler, schapranow, - congestion control

Documents