fast tcp: motivation, architecture, algorithms,...

FAST TCP:Motivation, Architecture, Algorithms, Performance

Cheng Jin David X. Wei Steven H. LowEngineering & Applied Science, Caltech

http://netlab.caltech.edu

Abstract— We describe FAST TCP, a new TCP congestioncontrol algorithm for high-speed long-latency networks, fromdesign to implementation. We highlight the approach taken byFAST TCP to address the four difficulties, at both packet andflow levels, which the current TCP implementation has at largewindows. We describe the architecture and characterize theequilibrium and stability properties of FAST TCP. We presentexperimental results comparing our first Linux prototype withTCP Reno, HSTCP, and STCP in terms of throughput, fairness,stability, and responsiveness. FAST TCP aims to rapidly stabilizehigh-speed long-latency networks into steady, efficient and fairoperating points, in dynamic sharing environments, and thepreliminary results are promising.

I. INTRODUCTION

Congestion control is a distributed algorithm to share net-work resources among competing users. It is important insituations where the availability of resources and the set ofcompeting users vary over time unpredictably, yet efficientsharing is desired. These constraints, unpredictable supply anddemand and efficient operation, necessarily lead to feedbackcontrol as the preferred approach, where traffic sources dy-namically adapt their rates to congestion in their paths. Onthe Internet, this is performed by the Transmission ControlProtocol (TCP) in source and destination computers involvedin data transfers.

The congestion control algorithm in the current TCP, whichwe refer to as Reno, was developed in 1988 [1] and has gonethrough several enhancements since, e.g., [2], [3], [4]. It hasperformed remarkably well and is generally believed to haveprevented severe congestion as the Internet scaled up by sixorders of magnitude in size, speed, load, and connectivity. Itis also well-known, however, that as bandwidth-delay prod-uct continues to grow, TCP Reno will eventually become aperformance bottleneck itself. The following four difficultiescontribute to the poor performance of TCP Reno in networkswith large bandwidth-delay products:

1) At the packet level, linear increase by one packet perRound-Trip Time (RTT) is too slow, and multiplicativedecrease per loss event is too drastic.

2) At the flow level, maintaining large average congestionwindows requires an extremely small equilibrium lossprobability.

3) At the packet level, oscillation is unavoidable becauseTCP uses a binary congestion signal (packet loss).

4) At the flow level, the dynamics is unstable, leadingto severe oscillations that can only be reduced by theaccurate estimation of packet loss probability and astable design of the flow dynamics.

We explain these difficulties in detail in Section II. In [5], wedescribed HSTCP [6] and STCP [7], two loss-based solutionsto these problems. In this paper, we propose a delay-basedsolution. See [8], [9], [10], [11], [12], [13], [14], [15], [16],[17], [18] for other proposals.

In Section III, we motivate delay-based approach. Delay-based congestion control has been proposed, e.g., in [19], [20],[8]. Its advantage over loss-based approach is small at lowspeed, but decisive at high speed, as we will argue below. Aspointed out in [21], delay can be a poor or untimely predictorof packet loss, and therefore using a delay-based algorithmto augment the basic AIMD (Additive Increase MultiplicativeDecrease) algorithm of TCP Reno is the wrong approach toaddress the above difficulties at large windows. Instead, a newapproach that fully exploits delay as a congestion measure,augmented with loss information, is needed. FAST TCP usesthis approach. Using queueing delay as the congestion measurehas two advantages.

First, queueing delay can be more accurately estimated thanloss probability both because packet losses in networks withlarge bandwidth-delay product are rare events (probabilityon the order 10−8 or smaller), and because loss samplesprovide coarser information than queueing delay samples.Indeed, measurements of delay are noisy, just as those ofloss probability. Each measurement of packet loss (whethera packet is lost) provides one bit of information for thefiltering of noise, whereas each measurement of queueingdelay provides multi-bit information. This makes it easier foran equation-based implementation to stabilize a network into asteady state with a target fairness and high utilization. Second,the dynamics of queueing delay seems to have the right scalingwith respect to network capacity. This helps maintain stabilityas a network scales up in capacity [22], [23], [24]. In SectionIII, we explain how we exploit these advantages to address thefour difficulties of TCP Reno.

In Section IV, we lay out an architecture to implement ourdesign. Even though the discussion is in the context of FASTTCP, the architecture can also serve as a general framework toguide the design of other congestion control mechanisms, notnecessarily limited to TCP, for high-speed networks. The maincomponents in the architecture can be designed separatelyand upgraded asynchronously. Unlike the conventional design,FAST TCP can use the same window and burstiness controlalgorithms regardless of whether a source is in the normal stateor the loss recovery state. This leads to a clean separation ofcomponents in both functionality and code structure. We thenpresent an overview of some of the algorithms implementedin our current prototype.

In Section V, we present a mathematical model ofthe window control algorithm. We prove that FAST TCPhas the same equilibrium properties as TCP Vegas [25],[26]. In particular, it does not penalize flows with largepropagation delays, and it achieves weighted proportionalfairness [27]. For the special case of single bottlenecklink with heterogeneous flows, we prove that the windowcontrol algorithm of FAST is globally stable, in the absence of

0-7803-8356-7/04/$20.00 (C) 2004 IEEE IEEE INFOCOM 2004

feedback delay. Moreover, starting from any initial state, anetwork converges exponentially to a unique equilibrium.

In Section VI, we present preliminary experimental resultsto illustrate throughput, fairness, stability, and responsivenessof FAST TCP, in the presence of delay and in heterogeneousand dynamic environments where flows of different delaysjoin and depart asynchronously. We compare the performanceof FAST TCP with Reno, HSTCP (HighSpeed TCP [6]),and STCP (Scalable TCP [7]), using their default param-eters. In these experiments, FAST TCP achieved the bestperformance under each criterion, while HSTCP and STCPimproved throughput and responsiveness over Reno at the costof fairness and stability. We conclude in Section VII.

II. PROBLEMS AT LARGE WINDOWS

A congestion control algorithm can be designed at twolevels. The flow-level (macroscopic) design aims to achievehigh utilization, low queueing delay and loss, fairness, andstability. The packet-level design implements these flow-levelgoals within the constraints imposed by end-to-end control.Historically for TCP Reno, packet-level implementation wasintroduced first. The resulting flow-level properties, such asfairness, stability, and the relationship between equilibriumwindow and loss probability, were then understood as anafterthought. In contrast, the packet-level designs of HSTCP[6], STCP [7], and FAST TCP are explicitly guided by flow-level goals.

We elaborate in this section on the four difficulties of TCPReno listed in Section I. It is important to distinguish betweenpacket-level and flow-level difficulties because they must beaddressed by different means.

A. Packet and flow level modeling

The congestion avoidance algorithm of TCP Reno and itsvariants have the form of AIMD [1]. The pseudo code forwindow adjustment is:

Ack: w ←− w+1w

Loss: w ←− w− 12w

This is a packet-level model, but it induces certain flow-levelproperties such as throughput, fairness, and stability.

These properties can be understood with a flow-level modelof the AIMD algorithm, e.g., [28], [29], [30]. The windowwi(t) of source i increases by 1 packet per RTT,1 anddecreases per unit time by

xi(t)pi(t) · 12 ·43wi(t) packets

where

xi(t) := wi(t)/Ti(t) pkts/sec

Ti(t) is the round-trip time, and pi(t) is the (delayed) end-to-end loss probability, in period t.2 Here, 4wi(t)/3 is the peak

1It should be (1 − pi(t)) packets, where pi(t) is the end-to-end lossprobability. This is roughly 1 when pi(t) is small.

2This model assumes that window is halved on each packet loss. It can bemodified to model the case, where window is halved at most once in eachRTT. This does not qualitatively change the following discussion.

window size that gives the “average” window of wi(t). Hence,a flow-level model of AIMD is:

wi(t) =1

Ti(t)− 2

3xi(t)pi(t)wi(t) (1)

Setting wi(t) = 0 in (1) yields the well-known 1/√

p formulafor TCP Reno discovered in [31], [32], which relates lossprobability to window size in equilibrium:

p∗i =3

2w∗2i

(2)

In summary, (1) and (2) describe the flow-level dynamics andthe equilibrium, respectively, for TCP Reno. It turns out thatdifferent variants of TCP all have the same dynamic structureat the flow level (see [5], [33]). By defining

κi(wi, Ti) =1Ti

and ui(wi, Ti) =1.5w2

i

and noting that wi = xiTi, we can express (1) as:

wi(t) = κ(t)(

1− pi(t)ui(t)

)(3)

where we have used the shorthand κi(t) = κi(wi(t), Ti(t))and ui(t) = ui(wi(t), Ti(t)). Equation 3 can be used todescribe all known TCP variants, and different variants differin their choices of the gain function κi and marginal utilityfunction ui, and whether the congestion measure pi is lossprobability or queueing delay.

Next, we illustrate the equilibrium and dynamics problemsof TCP Reno, at both the packet and flow levels, as bandwidth-delay product increases.

B. Equilibrium problem

The equilibrium problem at the flow level is expressed in(2): the end-to-end loss probability must be exceedingly smallto sustain a large window size, making the equilibrium difficultto maintain in practice, as bandwidth-delay product increases.

Even though equilibrium is a flow-level notion, this problemmanifests itself at the packet level, where a source incrementsits window too slowly and decrements it too drastically.When the peak window is 80,000-packet (corresponding toan “average” window of 60,000 packets), which is necessaryto sustain 7.2Gbps using 1,500-byte packets with a RTT of100ms, it takes 40,000 RTTs, or almost 70 minutes, to recoverfrom a single packet loss. This is illustrated in Figure 1a,where the size of window increment per RTT and decrementper loss, 1 and 0.5wi, respectively, are plotted as functionsof wi. The increment function for Reno (and for HSTCP) isalmost indistinguishable from the x-axis. Moreover, the gapbetween the increment and decrement functions grows rapidlyas wi increases. Since the average increment and decrementmust be equal in equilibrium, the required loss probability canbe exceedingly small at large wi. This picture is thus simplya visualization of (2).

To address the difficulties of TCP Reno at large windowsizes, HSTCP and STCP increase more aggressively anddecrease more gently, as discussed in [5], [33].


0 1 2 3 4 5 6 7 8

x 104

−10000

−9000

−8000

−7000

−6000

−5000

−4000

−3000

−2000

−1000

0

window (pkts)

win

dow

adj

ustm

ent (

pkts

)

Reno

S−TCP

HSTCP

w* O

inc per RTT

O dec per loss

(a) Reno, HSTCP, and STCP

−100 −80 −60 −40 −20 0 20 40 60 80 100−100

−80

−60

−40

−20

0

20

40

60

80

100

distance from equilibrium pi/u

i−1

win

dow

adj

ustm

ent (

pkts

)

w = w*

(b) FAST

Fig. 1. Packet-level implementation: (a) Window increment per RTT and decrement per loss, as functions of the current window. The increment functionsfor TCP Reno and HSTCP are almost identical at this scale. (b) Window update as a function of distance from equilibrium for FAST.

C. Dynamic problems

The causes of the oscillatory behavior of TCP Reno lie inits design at both the packet and flow levels. At the packetlevel, the choice of binary congestion signal necessarily leadsto oscillation, and the parameter setting in Reno worsens thesituation as bandwidth-delay product increases. At the flowlevel, the system dynamics given by (1) is unstable at largebandwidth-delay products [29], [30]. These must be addressedby different means, as we now elaborate.

Figure 2(a) illustrates the operating points chosen by vari-ous TCP congestion control algorithms, using the single-linksingle-flow scenario. It shows queueing delay as a function ofwindow size. Queueing delay starts to build up after point Cwhere window equals bandwidth-propagation-delay product,until point R where the queue overflows. Since Reno oscillatesaround point R, the peak window size goes beyond pointR. The minimum window in steady state is half of the peakwindow. This is the basis for the rule of thumb that bottleneckbuffer should be at least one bandwidth-delay product: theminimum window will then be above point C, and buffer willnot empty in steady state operation, yielding full utilization.

In the loss-based approach, full utilization, even if achiev-able, comes at the cost of severe oscillations and potentiallylarge queueing delay. The DUAL scheme in [20] proposesto oscillate around point D, the midpoint between C andR when the buffer is half-full. DUAL increases congestionwindow linearly by one packet per RTT, as long as queueingdelay is less than half of the maximum value, and decreasesmultiplicatively by a factor of 1/8, when queueing delayexceeds half of the maximum value. The scheme CARD (Con-gestion Avoidance using Round-trip Delay) of [19] proposesto oscillate around point C through AIMD with the sameparameter (1, 1/8) as DUAL, based on the ratio of round-trip delay and delay gradient, to maximize power. In all theseschemes, the congestion signal is used as a binary signal, andhence congestion window must oscillate.

Congestion window can be stabilized only if multi-bit feed-back is used. This is the approach taken by the equation-basedalgorithm in [34], where congestion window is adjusted basedon the estimated loss probability in an attempt to stabilizearound a target value given by (2). Its operating point isT in Figure 2(b), near the overflowing point. This approacheliminates the oscillation due to packet-level AIMD, but two

difficulties remain at the flow level.First, equation-based control requires the explicit estimation

of end-to-end loss probability. This is difficult when the lossprobability is small. Second, even if loss probability canbe perfectly estimated, Reno’s flow dynamics, described byequation (1) leads to a feedback system that becomes unstableas feedback delay increases, and more strikingly, as networkcapacity increases [29], [30]. The instability at the flow levelcan lead to severe oscillations that can be reduced only bystabilizing the flow-level dynamics. We will return to bothpoints in Section III.

III. DELAY-BASED APPROACH

In this section, we motivate delay-based approach to addressthe four difficulties at large window sizes.

A. MotivationAlthough improved loss-based protocols such as HSTCP

and STCP have been proposed as replacements to TCP Reno,we showed in [5] that they don’t address all four problems(Section I) of TCP Reno. To illustrate this, we plot theincrement and decrement functions of HSTCP and STCP inFigure 1(a) alongside TCP Reno. Both protocols upper boundTCP Reno: each increases more aggressively and decreasesless drastically, so that the gap between the increment anddecrement functions is narrowed. This means, in equilibrium,both HSTCP and STCP can tolerate larger loss probabilitiesthan TCP Reno, thus achieving larger equilibrium windows.However, neither solves the dynamics problems at both thepacket and the flow levels.

In [5], we show that the congestion windows in Reno,HSTCP and STCP all evolve according to:

wi(t) = κi(t) ·(

1− pi(t)ui(t)

)(4)

where κ(t) := κi(wi(t), Ti(t)) and ui(t) := ui(wi(t), Ti(t)).Moreover, the dynamics of FAST TCP also takes the sameform; see below. They differ only in the choice of the gainfunction κi(wi, Ti), the marginal utility function ui(wi, Ti),and the end-to-end congestion measure pi. Hence, at the flowlevel, there are only three design decisions:• κi(wi, Ti): the choice of the gain function κi determines

the dynamic properties such as stability and responsive-ness, but does not affect the equilibrium properties.


Queue Delay

WindowC D

R

delay

loss

(a) Binary signal: oscillatory

Queue Delay

WindowF

delay

loss

T

R

(b) Multi-bit signal: stabilizable

Fig. 2. Operating points of TCP algorithms: R: Reno [1], HSTCP [6], STCP [7]; D: DUAL [20]; C: CARD [19]; T : TFRC [34]; F : Vegas [8], FAST.

• ui(wi, Ti): the choice of the marginal utility functionui mainly determines equilibrium properties such as theequilibrium rate allocation and its fairness.

• pi: in the absence of explicit feedback, the choice ofcongestion measure pi is limited to loss probability orqueueing delay. The dynamics of pi(t) is determined atlinks.

The design choices in Reno, HSTCP, STCP and FAST areshown in Table I.

κi(wi, Ti) ui(wi, Ti) pi

Reno 1/Ti 1.5/w2i loss probability

HSTCP0.16b(wi)w

0.80i

(2−b(wi))Ti0.08/w1.20

i loss probabilitySTCP awi/Ti ρ/wi loss probabilityFAST γαi αi/xi queueing delay

TABLE ICOMMON DYNAMIC STRUCTURE: wi IS SOURCE i’S WINDOW SIZE, Ti IS

ITS ROUND-TRIP TIME, pi IS CONGESTION MEASURE, xi = wi/Ti ;a, b(wi), ρ, γ, αi ARE PROTOCOL PARAMETERS.

These choices produce equilibrium characterizations shownin Table II.

Reno xi = 1Ti

· αi

p0.50i

HSTCP xi = 1Ti

· αi

p0.84i

STCP xi = 1Ti

· αipi

FAST xi = αipi

TABLE IICOMMON EQUILIBRIUM STRUCTURE.

This common model (4) can be interpreted as follows:the goal at the flow level is to equalize marginal utilityui(t) with the end-to-end measure of congestion, pi(t). Thisinterpretation immediately suggests an equation-based packet-level implementation where both the direction and size of thewindow adjustment wi(t) are based on the difference betweenthe ratio pi(t)/ui(t) and the target of 1. Unlike the approachtaken by Reno, HSTCP, and STCP, this approach eliminatespacket-level oscillations due to the binary nature of congestionsignal. It however requires the explicit estimation of the end-to-end congestion measure pi(t).

Without explicit feedback, pi(t) can only be loss probability,as used in TFRC [34], or queueing delay, as used in TCP Vegas[8] and FAST TCP.3 Queueing delay can be more accurately

3It is debatable whether TCP Vegas is equation-based since the size of itswindow adjustment does not depend on queueing delay. This is not importantat low speed but critical at high speed.

estimated than loss probability both because packet lossesin networks with large bandwidth-delay products are rareevents (probability on the order 10−8 or smaller), and becauseloss samples provide coarser information than queueing delaysamples. Indeed, each measurement of packet loss (whethera packet is lost) provides one bit of information for thefiltering of noise, whereas each measurement of queueingdelay provides multi-bit information. This allows an equation-based implementation to stabilize a network into a steady statewith a target fairness and high utilization.

At the flow level, the dynamics of the feedback system mustbe stable in the presence of delay, as the network capacityincreases. Here, again, queueing delay has an advantage overloss probability as a congestion measure: the dynamics ofqueueing delay seems to have the right scaling with respectto network capacity. This helps maintain stability as networkcapacity grows [22], [23], [24].

B. Implementation strategy

The delay-based approach, with proper flow and packetlevel designs, can address the four difficulties of Reno atlarge windows. First, by explicitly estimating how far thecurrent state pi(t)/ui(t) is from the equilibrium value of 1,our scheme can drive the system rapidly, yet in a fair andstable manner, toward the equilibrium. The window adjustmentis small when the current state is close to equilibrium andlarge otherwise, independent of where the equilibrium is, asillustrated in Figure 1(b). This is in stark contrast to theapproach taken by Reno, HSTCP, and STCP, where windowadjustment depends on just the current window size and isindependent of where the current state is with respect to thetarget (compare Figures 1(a) and (b)). Like the equation-basedscheme in [34], this approach avoids the problem of slowincrease and drastic decrease in Reno, as the network scalesup.

Second, by choosing a multi-bit congestion measure, thisapproach eliminates the packet-level oscillation due to binaryfeedback, avoiding Reno’s third problem.

Third, using queueing delay as the congestion measurepi(t) allows the network to stabilize in the region below theoverflowing point, around point F in Figure 2(b), when thebuffer size is sufficiently large. Stabilization at this operatingpoint eliminates large queueing delay and unnecessary packetloss. More importantly, it makes room for buffering “mice”traffic. To avoid the second problem in Reno, where therequired equilibrium congestion measure (loss probability forReno, and queueing delay here) is too small to practically


estimate, the algorithm must adapt its parameter αi withcapacity to maintain small but sufficient queueing delay.

Finally, to avoid the fourth problem of Reno, the windowcontrol algorithm must be stable, in addition to being fair andefficient, at the flow level. The use of queueing delay as acongestion measure facilitates the design as queueing delaynaturally scales with capacity [22], [23], [24].

The design of TCP congestion control algorithm can thusbe conceptually divided into two levels:• At the flow level, the goal is to design a class of function

pairs, ui(wi, Ti) and κ(wi, Ti), so that the feedbacksystem described by (4), together with link dynamics inpi(t) and the interconnection, has an equilibrium that isfair and efficient, and that the equilibrium is stable, in thepresence of feedback delay.

• At the packet level, the design must deal with issues thatare ignored by the flow-level model or modeling assump-tions that are violated in practice, in order to achieve theseflow-level goals. These issues include burstiness control,loss recovery, and parameter estimation.

The implementation then proceeds in three steps:1) determine various system components;2) translate the flow-level design into packet-level algo-

rithms;3) implement the packet-level algorithms in a specific op-

erating system.The actual process iterates intimately between flow and packetlevel designs, between theory, implementation, and experi-ments, and among the three implementation steps.

The emerging theory of large-scale networks under end-to-end control, e.g., [27], [35], [36], [25], [37], [38], [26], [39],[22], [40], [41], [29], [30], [42], [43], [24], [23], [15] (see also,e.g., [44], [45], [46] for recent surveys), forms the foundationof the flow-level design. The theory plays an important role byproviding a framework to understand issues, clarify ideas, andsuggest directions, leading to a robust and high performanceimplementation.

We lay out the architecture of FAST TCP next.

IV. ARCHITECTURE AND ALGORITHMS

We separate the congestion control mechanism of TCPinto four components in Figure 3. These four componentsare functionally independent so that they can be designedseparately and upgraded asynchronously. In this section, wefocus on the two parts that we have implemented in the currentprototype (see [33]).

Burstiness Control

Window Control

TCP Protocol Processing

DataControl

Estimation

Fig. 3. FAST TCP architecture.

The data control component determines which packets totransmit, window control determines how many packets totransmit, and burstiness control determines when to transmitthese packets. These decisions are made based on informa-tion provided by the estimation component. Window controlregulates packet transmission at the RTT timescale, whileburstiness control works at a smaller timescale.

In the following subsections, we provide an overview of es-timation and window control and the algorithms implementedin our current prototype. An initial prototype that included thefeatures discussed here was demonstrated in November 2002 atthe SuperComputing Conference, and the experimental resultswere reported in [47].

A. EstimationThis component provides estimations of various input pa-

rameters to the other three decision-making components. Itcomputes two pieces of feedback information for each datapacket sent. When a positive acknowledgment is received,it calculates the RTT for the corresponding data packet andupdates the average queueing delay and the minimum RTT.When a negative acknowledgment (signaled by three duplicateacknowledgments or timeout) is received, it generates a lossindication for this data packet to the other components. Theestimation component generates both a multi-bit queueingdelay sample and a one-bit loss-or-no-loss sample for eachdata packet.

The queueing delay is smoothed by taking a moving averagewith the weight η(t) := min{3/wi(t), 1/4} that depends onthe window wi(t) at time t, as follows. The k-th RTT sampleTi(k) updates the average RTT T i(k) according to:

T i(k + 1) = (1− η(tk))T i(k) + η(tk)Ti(k)

where tk is the time at which the k-th RTT sample is received.Taking di(k) to be the minimum RTT observed so far, theaverage queueing delay is estimated as:

qi(k) = T i(k)− di(k)

The weight η(t) is usually much smaller than the weight (1/8)used in TCP Reno. The average RTT T i(k) attempts to trackthe average over one congestion window. During each RTT,an entire window worth of RTT samples are received if everypacket is acknowledged. Otherwise, if delayed ack is used, thenumber of queueing delay samples is reduced so η(t) shouldbe adjusted accordingly.

B. Window controlThe window control component determines congestion win-

dow based on congestion information—queueing delay andpacket loss, provided by the estimation component. A keydecision in our design that departs from traditional TCPdesign is that the same algorithm is used for congestionwindow computation independent of the state of the sender.For example, in TCP Reno (without rate halving), congestionwindow is increased by one packet every RTT when there isno loss, and increased by one for each duplicate ack duringloss recovery. In FAST TCP, we would like to use the samealgorithm for window computation regardless of the senderstate.

Our congestion control mechanism reacts to both queueingdelay and packet loss. Under normal network conditions,FAST periodically updates the congestion window based onthe average RTT and average queueing delay provided by theestimation component, according to:

w ←− min

{2w, (1− γ)w + γ

(baseRTT

RTTw + α(w, qdelay)

) }(5)

where γ ∈ (0, 1], baseRTT is the minimum RTT observed sofar, and qdelay is the end-to-end (average) queueing delay.In our current implementation, congestion window changesover two RTTs: it is updated in one RTT and frozen in the


next. The update is spread out over the first RTT in a waysuch that congestion window is no more than doubled in eachRTT.

In our current prototype, we choose the functionα(w, qdelay) to be a constant at all times. This produceslinear convergence when the qdelay is zero. Alternatively,we can use a constant α only when qdelay is nonzeroand an α proportional to window, α(w, qdelay) = aw,when qdelay is zero. In this case, when qdelay is zero,FAST performs multiplicative increase and grows exponen-tially at rate a to a neighborhood of qdelay > 0. Thenα(w, qdelay) switches to a constant α and, as we willsee in Theorem 2 below, window converges exponentially tothe equilibrium at a different rate that depends on qdelay.The constant α is the number of packets each flow attemptsto maintain in the network buffer(s) at equilibrium, similar toTCP Vegas [8].4

Although we would like to use the same congestion controlfunction during loss recovery, we have currently disabled thisfeature because of ambiguities associated with retransmittedpackets. Currently when a packet loss is detected, FAST halvesits window and enters loss recovery. The goal is to back offpacket transmission quickly when severe congestion occurs,in order to bring the system back to a regime where reliableRTT measurements are again available for window adjustment(5) to work effectively. A source does not react to delay untilit exits loss recovery.5

C. Packet-level implementationIt is important to maintain an abstraction of the imple-

mentation as as the code evolves. This abstraction shoulddescribe the high-level operations each component performsbased on external inputs, and can serve as a road map for futureTCP implementations as well as improvements to the existingimplementation. Whenever a non-trivial change is required,one should first update this abstraction to ensure that theoverall packet-level code would be built on a sound underlyingfoundation.

Since TCP is an event-based protocol, our control actionsshould be triggered by the occurrence of various events.Hence, we need to translate our flow-level algorithms intoevent-based packet-level algorithms. There are four types ofevents that FAST TCP reacts to: on the reception of anacknowledgment, after the transmission of a packet, at theend of a RTT, and for each packet loss.

For each acknowledgment received, the estimation compo-nent computes the average queueing delay, and the burstinesscontrol component determines whether packets can be injectedinto the network. For each packet transmitted, the estimationcomponent records a time-stamp, and the burstiness controlcomponent updates corresponding data structures for book-keeping. At a constant time interval, which we check on thearrival of each acknowledgment, window control calculates anew window size. At the end of each RTT, burstiness reductioncalculates the target throughput using the window and RTTmeasurements in the last RTT. Window pacing will then sched-ule to break up a large increment in congestion window into

4All experiments in Section VI used linear increase, i.e.,α(w, qdelay) = α for all qdelay.

5In the Linux TCP implementation, congestion window was frequentlyreduced to one when there were heavy losses. In order to ensure a reasonablerecovery time, we impose a minimum window of 16 packets during lossrecovery for connections that use large windows. This and other interimmeasures will be improved in future FAST TCP releases.

Component Design

Packet-Level Algorithm

OS-SpecificImplementation

per Ack

per RTT

per Loss

function 1

function 2

...

Flow-LevelAlgorithm

x=f(p)p=g(x,p)

Fig. 4. From flow-level design to implementation.

smaller increments over time. During loss recovery, congestionwindow should be continually updated based on congestionsignals from the network. Upon the detection of a packetloss event, a sender determines whether to retransmit eachunacknowledged packet right away or hold off until a moreappropriate time.

Figure 4 presents an approach to turn the high-level designof a congestion control algorithm into an implementation.First, an algorithm is designed at the flow-level and analyzedto ensure that it meets the high-level objectives such asfairness and stability. Based on that, one can determine thecomponents necessary to implement congestion control. Theflow-level algorithm can then be translated into a packet-levelalgorithm that consists of a set of event-based tasks. The event-based tasks should be independent of any specific TCP oroperating system implementation, but yet detailed enough sothe understanding of these tasks enables one to implementFAST in any operating system or protocol stack.

V. EQUILIBRIUM AND STABILITY OF WINDOW CONTROLALGORITHM

In this section, we present a model of the window con-trol algorithm. We show that, in equilibrium, the vectors ofsource windows and link queueing delays are the uniquesolutions of a pair of optimization problems (9)–(10). Thiscompletely characterizes the network equilibrium propertiessuch as throughput, fairness, and delay. We also analyzethe stability of the window control algorithm. We prove in[33] that, for a single link with heterogeneous sources, thewindow control algorithm (5) is globally stable, assumingzero feedback delay, and converges exponentially to a uniqueequilibrium. Extensive experiments in Section VI illustrate itsstability in the presence of feedback delay.

Given a network that consists of a set of resources with finitecapacities cl, e.g., transmission links, processing units, mem-ory, etc., we refer to them in general as “links” in our model.The network is shared by a set of unicast flows, identified bytheir sources. Let di denote the round-trip propagation delay ofsource i. Let R be the routing matrix where Rli = 1 if sourcei uses link l, and 0 otherwise. Let pl(t) denote the queueingdelay at link l at time t. Let qi(t) =

∑l Rlipl(t) be the round-

trip queueing delay, or in vector notation, q(t) = RT p(t). Thenthe round trip time of source i is Ti(t) := di + qi(t).

Each source i adapts wi(t) periodically according to: 6

wi(t + 1) = γ

(diwi(t)

di + qi(t)+ αi(wi(t), qi(t))

)+ (1− γ)wi(t) (6)

6Note that (6) can be rewritten as (when αi(wi, qi) = αi, constant)

wi(t + 1) = wi(t) + γi(αi − xi(t)qi(t))

From [26], TCP Vegas updates its window according to

wi(t + 1) = wi(t) +1

Ti(t)sgn(αi − xi(t)qi(t))

where sgn(z) = −1 if z < 0, 0 if z = 0, and 1 if z > 0. Hence FAST canbe thought of as a high-speed version of Vegas.


where γ ∈ (0, 1], at time t, and αi(wi, qi) is defined by:

αi(wi, qi) ={

aiwi if qi = 0αi otherwise (7)

A key departure of our model from those in the literatureis that we assume that a source’s send rate, defined asxi(t) := wi(t)/Ti(t), cannot exceed the throughput it receives.This is justified because of self-clocking: one round-trip timeafter a congestion window is increased, packet transmissionwill be clocked at the same rate as the throughput the flowreceives. See [48] for detailed justification and validationexperiments. A consequence of this assumption is that thelink queueing delay vector, p(t), is determined implicitlyby the instantaneous window size in a static manner: givenwi(t) = wi for all i, the link queueing delays pl(t) = pl ≥ 0for all l are given by:

∑i

Rliwi

di + qi

{= cl if pl > 0≤ cl if pl = 0 (8)

where again qi =∑

l Rlipl.The equilibrium values of windows w∗ and delays p∗ of the

network defined by (6)–(8) can be characterized as follows.Consider the utility maximization problem

maxx≥0

∑i

αi log xi s.t. Rx ≤ c (9)

and the following (dual) problem:

minp≥0

∑l

clpl −∑

i

αi log∑

l

Rlipl (10)

Theorem 1: Suppose R has full row rank. The uniqueequilibrium point (w∗, p∗) of the network defined by (6)–(8)exists and is such that x∗ = (x∗i := wi/(di + q∗i ),∀i) is theunique maximizer of (9) and p∗ is the unique minimizer of(10). This implies in particular that the equilibrium rate x∗ isαi-weighted proportionally fair.

Theorem 1 implies that FAST TCP has the same equilibriumproperties as TCP Vegas [25], [26]. Its throughput is given by

xi =αi

qi(11)

In particular, it does not penalize sources with large prop-agation delays di. The relation (11) also implies that, inequilibrium, source i maintains αi packets in the buffers alongits path [25], [26]. Hence, the total amount of buffering in thenetwork must be at least

∑i αi packets in order to reach the

equilibrium.We now turn to the stability of the algorithm. Global

stability in a general network in the presence of feedbackdelay is an open problem (see [49], [50] for stability analysisfor the single-link-single-source case). State-of-the-art resultseither prove global stability while ignoring feedback delay, orlocal stability in the presence of feedback delay. Our stabilityresult is restricted to a single link in the absence of delay.

Theorem 2: Suppose there is a single link with capacity c.Then the network defined by (6)–(8) is globally stable, andconverges geometrically to the unique equilibrium (w∗, p∗).

The basic idea of the proof is to show that the iterationfrom w(t) to w(t + 1) defined by (6)–(8) is a contractionmapping. Hence w(t) converges geometrically to the uniqueequilibrium.

Some properties follow from the proof of Theorem 2.Corollary 3: 1) Starting from any initial point

(w(0), p(0)), the link is fully utilized, i.e., equalityholds in (8), after a finite time.

2) The queue length is lower and upper bounded after afinite amount of time.

VI. PERFORMANCE

We have conducted some preliminary experiments on ourdummynet [51] testbed comparing performance of various newTCP algorithms as well as the Linux TCP implementation. Itis important to evaluate them not only in static environments,but also dynamic environments where flows come and go;and not only in terms of end-to-end throughput, but alsoqueue behavior in the network. In this study, we compareperformance among TCP connections of the same protocolsharing a single bottleneck link. In summary,

1) FAST TCP achieved the best overall performance ineach of the four evaluation criteria: throughput, fairness,responsiveness, and stability.

2) Both HSTCP and STCP improved throughput and re-sponsiveness of Linux TCP, although both showed fair-ness problems and oscillations with higher frequenciesand larger magnitudes.

In the following subsections, we will describe in detail ourexperimental setup, evaluation criteria, and results.

A. Testbed and kernel instrumentation

We constructed a testbed of a sender and a receiver bothrunning Linux, and an emulated router running FreeBSD. Eachtestbed machine has dual Xeon 2.66 GHz, 2 GB of mainmemory, and dual on-board Intel PRO/1000 Gigabit Ethernetinterfaces. We have tested these machines to ensure each isable to achieve a peak throughput of 940 Mbps with thestandard Linux TCP protocol using iperf.

queue monitor

Sender ReceiverRouter

sendermonitor

iperf iperf

50 ms

100 ms

150 ms

200 ms

Fig. 5. Testbed and the experimental setup.Figure 5 shows the setup of the testbed. The testbed router

supports paths of various delays and a single bottleneckcapacity with a fixed buffer size. It has monitoring capabilityat the sender and the router. The receiver runs different TCPtraffic sinks with different port numbers for connections withdifferent RTTs. We set up and run different experiments fromthe sender using an automatic script generator to start multipleiperf sessions to emulate multiple TCP connections.

Our testbed router ran dummynet [51] under FreeBSD. Weconfigured dummynet to create paths or pipes of differentdelays, 50, 100, 150, and 200ms, using different destinationport numbers on the receiving machine. We then createdanother pipe to emulate a bottleneck capacity of 800 Mbps anda buffer size of 2,000 packets, shared by all the delay pipes.Due to our need to emulate a high-speed bottleneck capacity,we increased the scheduling granularity of dummynet events.We recompiled the FreeBSD kernel so the task scheduler ranevery 1 ms. We also increased the size of the IP layer interrupt


queue (ipintrq) to 3000 to accommodate large bursts ofpackets.

We instrumented both the sender and the dummynet routerto capture relevant information for protocol evaluation. Foreach connection on the sending machine, the kernel monitorcaptured the congestion window, the observed baseRTT, andthe observed queueing delay. On the dummynet router, thekernel monitor captured the throughput at the dummynetbottleneck, the number of lost packets, and the average queuesize every two seconds. We retrieved the measurement dataafter the completion of each experiment in order to avoid diskI/O that may have interfered with the experiment itself.

We tested four TCP implementations: FAST, HSTCP, STCP,and Reno (Linux implementation). The FAST TCP is based onLinux 2.4.20 kernel, while the rest of the TCP protocols arebased on Linux 2.4.19 kernel. We ran tests and did not observeany appreciable difference between the two plain Linux ker-nels, and the TCP source codes of the two kernels are nearlyidentical. Linux TCP implementation includes all of the latestRFCs such as New Reno, SACK, D-SACK, and TCP highperformance extensions. There are two versions of HSTCP[52], [53]. We present the results of the implementation in[52], but our tests show that the implementation in [53] hascomparable performance.

In all of our experiments, the bottleneck capacity is 800Mbps—roughly 66 packets/ms, and the maximum buffer sizeis 2000 packets.

We now present our experimental results. We first look atthree cases in detail, comparing not only the throughput behav-ior seen at the source, but also the queue behavior inside thenetwork, by examining trajectories of throughputs, windows,instantaneous queue, cumulative losses, and link utilization.We then summarize the overall performance in a diverse setof experiments in terms of quantitative metrics, defined below,on throughput, fairness, stability, and responsiveness.

B. Case study: static scenarioWe present experimental results on aggregate throughput

in a simple static environment where, in each experiment, allTCP flows had the same propagation delay and started andterminated at the same times. This set of tests included 20experiments for different pairing of propagation delays, 50,100, 150, and 200ms, and the number of identical sources, 1,2, 4, 8, and 10. We ran this test suite under each of the fourTCP protocols. We then constructed a 3-d plot, in Figure 6,for each protocol with the x-axis and y-axis being the numberof sources and propagation delay, respectively. The z-axis isthe aggregate throughput.

All four protocols performed well when the number offlows was large or the propagation delay was small, i.e.,when the window size was small. The performance of FASTTCP remained consistent when these parameters changed. TCPReno, HSTCP, and STCP had varying degrees of performancedegradation as the window size increased, with TCP Renoshowing the most significant degradation.

This set of experiments, involving a static environmentand identical flows, does not test the fairness, stability andresponsiveness of the protocols. We take a close look atthese properties next in a dynamic scenario where networkequilibrium changes as flows come and go.

C. Case study: dynamic scenario IIn the first dynamic test, the number of flows was small

so that throughput per flow, and hence the window size, was

FAST HSTCP

2

4

6

8

10

50

100

150

2000

1

2

3

4

5

6

7

8

x 105

flow numberpropagatoin delay(ms)

FAST

thro

ughp

ut (K

bps)

2

4

6

8

10

50

100

150

2000

1

2

3

4

5

6

7

8

x 105


HSCT

P th

roug

hput

(Kbp

s)

STCP Reno

2

4

6

8

10

50

100

150

2000

1

2

3

4

5

6

7

8

x 105


Scal

able

TCP

thro

ughp

ut (K

bps)

2

4

6

8

10

50

100

150

2000

1

2

3

4

5

6

7

8

x 105


Linu

x 2.

4.19

TCP

thro

ughp

ut (K

bps)

Fig. 6. Static scenario: aggregate throughput.

large. There were three TCP flows, with propagation delays of100, 150, and 200ms, that started and terminated at differenttimes, as illustrated in Figures 7(a).

For each dynamic experiment, we generated two sets offigures, from the sender monitor and the queue monitor. Fromthe sender monitor, we obtained the trajectories of individualconnection throughput (in Kbps) and window size (in packets)over time. They are shown in Figure 8.

As new flows joined or old flows left, FAST TCP convergedto the new equilibrium rate allocation rapidly and stably.Reno’s throughput was also relatively smooth because of theslow (linear) increase before packet losses. The link utilizationwas low at the end of the experiment when it took 30 minutesfor a flow to reclaim the spare capacity due to the departureof another flow. HSTCP and STCP, in an attempt to respondmore quickly, went into severe oscillation.

From the queue monitor, we obtained three trajectories: theaverage queue size (packets), the number of cumulative packetlosses (packets), and the utilization of the bottleneck link (inpackets/ms), shown in Figure 9 from top to bottom. The queueunder FAST TCP was quite small throughout the experimentdue to the small number of flows. HSTCP and STCP exhibitedstrong oscillations that filled the buffer. The link utilizationsof FAST TCP and Reno were quite steady, whereas those ofHSTCP and STCP showed fluctuations.

From the throughput trajectories of each protocol, we cal-culate Jain’s fairness index (see Section VI-E for definition)for the rate allocations for each time interval that containsmore than one flow (see Figure 7(a)). The fairness indices areshown in Table III. FAST TCP obtained the best fairness,

Time #Sources FAST HSTCP STCP Reno1800 – 3600 2 .967 .927 .573 .6843600 – 5400 3 .970 .831 .793 .9005400 – 7200 2 .967 .873 .877 .718

TABLE IIIDYNAMIC SCENARIO I: INTRA-PROTOCOL FAIRNESS.

very close to 1, followed by HSTCP, Reno, and then STCP. Itconfirms that FAST TCP does not penalize against flows withlarge propagation delays. Even though HSTCP, STCP, andReno all try to equalize congestion windows among competingconnections instead of equalizing rates, this was not achievedas shown in Figure 8. The unfairness is especially severe inthe case of STCP, likely due to MIMD as explained in [54].

For FAST TCP, each source tries to maintain the samenumber of packets in the queue in equilibrium, and thus, in


1 x 100 ms

1 x 150 ms

1 x 200 ms

3600 90000

(a) Dynamic scenario I (3 flows)

2 x 50 ms

2 x 100 ms 2 x 100 ms

2 x 150 ms 2 x 150 ms

2 x 200 ms 2 x 200 ms

3600 2160010800

(b) Dynamic scenario II (8 flows)

Fig. 7. Dynamic scenario: each colored block represents one or more connections of certain propagation delay. The left and the right edges of each blockrepresent the starting and ending times, respectively, of the flow(s).

FAST HSTCP STCP Reno

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

5000

10000

15000

cwnd

(pkt

)

sec

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

2

4

6

8

10x 10

5

thro

ughp

ut (K

bps)

sec

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

5000

10000

15000

cwnd

(pkt

)

sec

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

2

4

6

8

10x 10

5

thro

ughp

ut (K

bps)

sec

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

5000

10000

15000

cwnd

(pkt

)

sec

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

2

4

6

8

10x 10

5

thro

ughp

ut (K

bps)

sec

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

5000

10000

15000

cwnd

(pkt

)

sec

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

2

4

6

8

10x 10

5

thro

ughp

ut (K

bps)

sec

Fig. 8. Dynamic scenario I: throughput and cwnd trajectories.


0 1000 2000 3000 4000 5000 6000 7000 8000 90000

500

1000

1500

2000

q avg (p

kt)

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

5

10x 10

4

cum

ulat

ive lo

ss (p

kt)

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

20

40

60

80

thro

ughp

ut (p

kt/m

s)

sec

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

500

1000

1500

2000

q avg (p

kt)

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

5

10x 10

4

cum

ulat

ive lo

ss (p

kt)

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

20

40

60

80

thro

ughp

ut (p

kt/m

s)

sec

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

500

1000

1500

2000

q avg (p

kt)

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

5

10x 10

4

cum

ulat

ive lo

ss (p

kt)

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

20

40

60

80

thro

ughp

ut (p

kt/m

s)

sec

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

500

1000

1500

2000

q avg (p

kt)

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

5

10x 10

4

cum

ulat

ive lo

ss (p

kt)

0 1000 2000 3000 4000 5000 6000 7000 8000 90000

20

40

60

80

thro

ughp

ut (p

kt/m

s)

sec

Fig. 9. Dynamic scenario I: dummynet queue sizes, losses, and link utilization.

theory, each competing source should get an equal share ofthe bottleneck bandwidth. Even though FAST TCP achievedthe best fairness index, we did not observe the expectedequal sharing of bandwidth (see Figure 8). We found thatconnections with longer RTTs consistently observed higherqueueing delays than those with shorter RTTs. For example,the connection on the path of 100 ms saw an average queueingdelay of 6 ms, while the connection on the path of 200 ms sawan average queueing delay of 9 ms. This caused the connectionwith longer RTTs to maintain fewer packets in the queue inequilibrium, thus getting a smaller share of the bandwidth.We have yet to uncover the source of this problem, but theearly conjecture is that when congestion window size is large,it is much harder to break up bursts of packets. With burstytraffic arriving at a queue, each packet would see a delay thatincludes the transmission times of all preceding packets inthe burst. However, if packets were spaced out smoothly, theneach packet would have seen a smaller queueing delay at thequeue.

D. Case study: dynamic scenario II

This experiment was similar to dynamic scenario I, exceptthat there were a larger number (8) of flows, with differentpropagation delays, which joined and departed accordingto the schedule in Figure 7(b). The qualitative behavior inthroughput, fairness, stability, and responsiveness for each ofthe protocols is similar to those in scenario I, and in fact isamplified as the number of flows increases.

Specifically, as the number of competing sources increasesin a network, stability becomes worse for the loss-basedprotocols. As shown in Figures 10 and 11, oscillations in

both congestion windows and queue size are more severe forall loss-base protocols. Packet loss is also more severe. Theperformance of FAST TCP did not degrade in any significantway. Connections sharing the link achieved very similar rates.There was a reasonably stable queue at all times, with littlepacket loss and high link utilization. Intra-protocol fairness isshown in Table IV, with little change for FAST TCP.

Time Sources FAST HSTCP STCP Reno0 – 1800 2 1.0 .806 .999 .711

1800 – 3600 4 .987 .940 .721 .9793600 – 5400 6 .976 .808 .631 .9785400 – 7200 8 .977 .747 .566 .8307200 – 9000 6 .970 .800 .613 .8459000 – 10800 4 .989 .906 .636 .885

10800 – 12600 2 .998 .996 .643 .99312600 – 14400 4 .989 .843 .780 .78214400 – 16200 6 .944 .769 .613 .88016200 – 18000 8 .973 .816 .547 .78718000 – 19800 6 .982 .899 .563 .89219800 – 21600 4 .995 .948 .668 .89621600 – 23400 2 1.000 .920 .994 1.000

TABLE IVFAIRNESS AMONG VARIOUS PROTOCOLS FOR EXPERIMENT II.

E. Overall evaluation

We have conducted several other experiments, with differentdelays, number of flows, and their arrival and departurepatterns. In all these experiments, the bottleneck link capacitywas 800Mbps and buffer size 2000 packets. We present in thissubsection a summary of protocol performance in terms ofsome quantitative measures on throughput, fairness, stability,and responsiveness.



0 0.5 1 1.5 2

x 104

0

5000

10000

15000

cwnd

(pkt

)

sec

0 0.5 1 1.5 2

x 104

0

2

4

6

8x 10

5

thro

ughp

ut (K

bps)

sec

0 0.5 1 1.5 2 2.5

x 104

0

0.5

1

1.5

2x 10

4

cwnd

(pkt)

sec

0 0.5 1 1.5 2 2.5

x 104

0

2

4

6

8x 10

5

thro

ughp

ut (K

bps)

sec

0 0.5 1 1.5 2

x 104

0

5000

10000

15000

cwnd

(pkt

)

sec

0 0.5 1 1.5 2

x 104

0

2

4

6

8x 10

5

thro

ughp

ut (K

bps)

sec

0 0.5 1 1.5 2

x 104

0

5000

10000

15000

cwnd

(pkt

)

sec

0 0.5 1 1.5 2

x 104

0

2

4

6

8x 10

5

thro

ughp

ut (K

bps)

sec

Fig. 10. Dynamic scenario II: throughput and cwnd trajectories.


0 0.5 1 1.5 2

x 104

0

500

1000

1500

2000

q avg (p

kt)

0 0.5 1 1.5 2

x 104

0

1

2

3

4

5x 10

5

cum

ulativ

e los

s (pk

t)

0 0.5 1 1.5 2

x 104

0

20

40

60

80

thro

ughp

ut (p

kt/m

s)

sec

0 0.5 1 1.5 2

x 104

0

500

1000

1500

2000

q avg (p

kt)

0 0.5 1 1.5 2

x 104

0

1

2

3

4

5x 10

5cu

mula

tive

loss (

pkt)

0 0.5 1 1.5 2

x 104

0

20

40

60

80

thro

ughp

ut (p

kt/m

s)

sec

0 0.5 1 1.5 2

x 104

0

500

1000

1500

2000

q avg (p

kt)

0 0.5 1 1.5 2

x 104

0

1

2

3

4

5x 10

5

cum

ulativ

e los

s (pk

t)

0 0.5 1 1.5 2

x 104

0

20

40

60

80

thro

ughp

ut (p

kt/m

s)

sec

0 0.5 1 1.5 2

x 104

0

500

1000

1500

2000

q avg (p

kt)

0 0.5 1 1.5 2

x 104

0

1

2

3

4

5x 10

5

cum

ulativ

e los

s (pk

t)

0 0.5 1 1.5 2

x 104

0

20

40

60

80

thro

ughp

ut (p

kt/m

s)

sec

Fig. 11. Dynamic scenario II: dummynet queue sizes, losses, and link utilization.

We use the output of iperf for our quantitative evaluation.Each iperf session in our experiments produced five-secondaverages of its throughput. This is the data rate (i.e., goodput)applications such as iperf receives, and is slightly lessthan the bottleneck bandwidth due to IP and Ethernet packetheaders.

Let xi(k) be the average throughput of flow i in the five-second period k. Most tests involved dynamic scenarios whereflows joined and departed. For the definitions below, supposethe composition of flows changes in period k = 1, remainsfixed over period k = 1, . . . , m, and changes again in periodk = m+1, so that [1,m] is the maximum-length interval overwhich the same equilibrium holds. Suppose there are n activeflows in this interval, indexed by i = 1, . . . , n. Let

xi :=1m

m∑k=1

xi(k)

be the average throughput of flow i over this interval. We nowdefine our performance metrics for this interval [1,m] usingthese throughput measurements.

1) Throughput: The average aggregate throughput for theinterval [1,m] is defined as:7

E :=n∑

i=1

xi

2) Intra-protocol fairness: Jain’s fairness index for theinterval [1,m] is defined as [55]:

F =(∑n

i=1 xi)2

n∑n

i=1 x2i

F ∈ (0, 1] and F = 1 is ideal (equal sharing).3) Stability: The stability index of flow i is the sample

standard deviation normalized by the average through-put:

Si :=1xi

√√√√ 1m− 1

m∑k=1

(xi(k)− xi)2

7As mentioned above, this is the throughput (or goodput) seen at theapplication layer, not TCP layer.

The smaller the stability index, the less oscillation asource experiences. The stability index for interval [0,m]is the average over the n active sources:

S :=1n

n∑i=1

Si

4) Responsiveness: The responsiveness index measuresthe speed of convergence when network equilibriumchanges at k = 1, i.e., when flows join or depart. Letxi(k) be the running average by period k ≤ m:

xi(k) :=1k

k∑t=1

xi(k)

Then xi(m) = xi is the average over the entire interval[1,m].Responsiveness index R1 measures how fast the runningaverage xi(k) of the slowest source converges to xi:8

R1 := maxi

max{

k :∣∣∣∣xi(k)− xi

xi

∣∣∣∣ > 0.1}

For each TCP protocol, we obtain one set of computedvalues for each evaluation criterion for all of our experiments.We plot the CDF (cumulative distribution function) of eachset of values. These are shown in Figures 12 – 15.

From Figures 12–15, FAST has the best performance amongall protocols under each evaluation criterion. More impor-tantly, the variation in each of the distributions is smallerunder FAST than under the other protocols, suggesting thatFAST had fairly consistent performance in our test scenarios.We also observe that both HSTCP and STCP achieved higherthroughput and improved responsiveness compared with TCPReno. STCP had worse intra-protocol fairness compared withTCP Reno, while HSTCP achieved comparable intra-protocolfairness to Reno (see Figures 13, 8 and 10). Both HSTCPand STCP showed increased oscillations compared with Reno

8The natural definition of responsiveness index as the earliest period afterwhich the throughput xi(k) (as opposed to the running average xi(k) of thethroughput) stays within 10% of its equilibrium value is unsuitable for TCPprotocols that do not stabilize into an equilibrium value. Hence we define it interms of xi(k) which, by definition, always converges to xi by the end of theinterval k = m. This definition captures the intuitive notion of responsivenessif xi(k) settles into a periodic limit cycle.


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 100 200 300 400 500 600 700 800C

DF

Throughput (Mbps)

FAST TCPHighSpeed TCP

Scalable TCPTCP Reno

Fig. 12. Overall evaluation: throughput.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

CD

F

Fairness



Fig. 13. Overall evaluation: fairness.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

CD

F

Stability



Fig. 14. Overall evaluation: stability.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 200 400 600 800 1000 1200 1400 1600 1800

CD

F

Responsiveness (sec)



Fig. 15. Overall evaluation: responsivenessindex R1.

(Figures 14, 8 and 9), and the oscillations became worse asthe number of sources increased (Figures 10 and 11).

From Figure 15, FAST TCP achieved a much better respon-siveness index R1 (which is based on worst case individualthroughput) than the other schemes. We caution however thatit can be hard to quantify “responsiveness” for protocols thatdo not stabilize into an equilibrium point or a periodic limitcycle, and hence the unresponsiveness of Reno, HSTCP, andSTCP, as measured by index R1, should be interpreted withcare.

VII. CONCLUSION

We have described an alternative congestion control algo-rithm, FAST TCP, that addresses the four main problems ofTCP Reno in networks with high capacities and large latencies.FAST TCP has a log utility function and achieves weightedproportional fairness. Its window adjustment is equation-based, under which the network moves rapidly toward equilib-rium when the current state is far away and slows down when itapproaches the equilibrium. FAST TCP uses queueing delay, inaddition to packet loss, as a congestion signal. Queueing delayprovides a finer measure of congestion and scales naturallywith network capacity.

We have presented experimental results of our first Linuxprototype and compared its performance with TCP Reno,HSTCP, and STCP. We have evaluated these algorithms notonly in static environments, but also dynamic environmentswhere flows come and go, and not only in terms of end-to-endthroughput, but also queue behavior in the network. In theseexperiments, HSTCP and STCP achieved better throughputand link utilization than Reno, but their congestion windowsand network queue lengths had significant oscillations. TCPReno produced less oscillation, but at the cost of lower linkutilization when sources departed. FAST TCP, on the otherhand, consistently outperforms these protocols in terms ofthroughput, fairness, stability, and responsiveness.

Acknowledgments: We gratefully acknowledge the contri-butions of the FAST project team and our collaborators,

at http://netlab.caltech.edu/FAST/, in particular,G. Almes, J. Bunn, D. H. Choe, R. L. A. Cottrell, V. Do-raiswami, J. C. Doyle, W. Feng, O. Martin, H. Newman, F.Paganini, S. Ravot, S. Shalunov, S. Singh, J. Wang, Z. Wang,S. Yip. This work is funded by NSF (grants ANI-0113425 andANI-0230967), Caltech Lee Center for Advanced Networking,ARO (grant DAAD19-02-1-0283), AFOSR (grant F49620-03-1-0119), and Cisco.

REFERENCES

[1] V. Jacobson, “Congestion avoidance and control,” Proceedings ofSIGCOMM’88, ACM, August 1988, An updated version is availablevia ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.

[2] M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow, “TCP Selec-tive Acknowledgment Options,” RFC 2018, ftp://ftp.isi.edu/in-notes/rfc2018.txt, October 1996.

[3] V. Jacobson, R. Braden, and D. Borman, “TCP extensions forhigh performance,” RFC 1323, ftp://ftp.isi.edu/in-notes/rfc1323.txt, May 1992.

[4] Janey Hoe, “Improving the startup behavior of a congestion controlscheme for tcp,” in ACM Sigcomm’96, August 1996, http://www.acm.org/sigcomm/sigcomm96/program.html.

[5] Cheng Jin, David X. Wei, and Steven H. Low, “The case for delay-based congestion control,” in Proc. of IEEE Computer CommunicationWorkshop (CCW), October 2003.

[6] Sally Floyd, “HighSpeed TCP for large congestion windows,” Internetdraft draft-floyd-tcp-highspeed-02.txt, work in progress, http://www.icir.org/floyd/hstcp.html, February 2003.

[7] Tom Kelly, “Scalable TCP: Improving performance in highspeed widearea networks,” Submitted for publication, http://www-lce.eng.cam.ac.uk/˜ctk21/scalable/, December 2002.

[8] Lawrence S. Brakmo and Larry L. Peterson, “TCP Vegas: end-to-endcongestion avoidance on a global Internet,” IEEE Journal on SelectedAreas in Communications, vol. 13, no. 8, pp. 1465–80, October 1995,http://cs.princeton.edu/nsg/papers/jsac-vegas.ps.

[9] E. Weigle and W. Feng, “A case for TCP Vegas in high-performancecomputational grids,” in Proceedings of the 9th International Symposiumon High Performance Distributed Computing (HPDC’01), August 2001.

[10] W. Feng and S. Vanichpun, “Enabling compatibility between TCP Renoand TCP Vegas,” IEEE Symposium on Applications and the Internet(SAINT 2003), January 2003.

[11] C. Casetti, M. Gerla, S. Mascolo, M. Sansadidi, and R. Wang, “TCPWestwood: end-to-end congestion control for wired/wireless networks,”Wireless Networks Journal, vol. 8, pp. 467–479, 2002.

[12] R. Wang, M. Valla, M. Sanadidi, B. Ng, and M. Gerla, “Usingadaptive rate estimation to provide enhanced and robust transport overheterogeneous networks,” in Proc. of IEEE ICNP, 2002.


[13] D. Katabi, M. Handley, and C. Rohrs, “Congestion control for high-bandwidth delay product networks,” in Proc. ACM Sigcomm, August2002.

[14] Shudong Jin, Liang Guo, Ibrahim Matta, and Azer Bestavros, “Aspectrum of TCP-friendly window-based congestion control algorithms,”IEEE/ACM Transactions on Networking, vol. 11, no. 3, June 2003.

[15] R. Shorten, D. Leith, J. Foy, and R. Kilduff, “Analysis and design ofcongestion control in synchronised communication networks,” in Proc.of 12th Yale Workshop on Adaptive and Learning Systems, May 2003,www.hamilton.ie/doug_leith.htm.

[16] L. Xu, K. Harfoush, and I. Rhee, “Binary increase congestion controlfor fast long-distance networks,” in Proc. of IEEE Infocom, 2004.

[17] A. Kuzmanovic and E. Knightly, “TCP-LP: a distributed algorithm forlow priority data transfer,” in Proc. of IEEE Infocom, 2003.

[18] H. Bullot and L. Cottrell, “Tcp stacks testbed,” http://www-iepm.slac.stanford.edu/bw/tcp-eval/.

[19] Raj Jain, “A delay-based approach for congestion avoidance ininterconnected heterogeneous computer networks,” ACM ComputerCommunication Review, vol. 19, no. 5, pp. 56–71, Oct. 1989.

[20] Z. Wang and J. Crowcroft, “Eliminating periodic packet losses in the4.3-Tahoe BSD TCP congestion control algorithm,” ACM ComputerCommunications Review, April 1992.

[21] Jim Martin, Arne Nilsson, and Injong Rhee, “Delay-based congestionavoidance for TCP,” IEEE/ACM Trans. on Networking, vol. 11, no. 3,pp. 356–369, June 2003.

[22] Fernando Paganini, John C. Doyle, and Steven H. Low, “Scalable lawsfor stable network congestion control,” in Proceedings of Conferenceon Decision and Control, December 2001, http://www.ee.ucla.edu/˜paganini.

[23] Hyojeong Choe and Steven H. Low, “Stabilized Vegas,” in Proc. ofIEEE Infocom, April 2003, http://netlab.caltech.edu.

[24] Fernando Paganini, Zhikui Wang, Steven H. Low, and John C. Doyle,“A new TCP/AQM for stability and performance in fast networks,” inProc. of IEEE Infocom, April 2003, http://www.ee.ucla.edu/˜paganini.

[25] Jeonghoon Mo and Jean Walrand, “Fair end-to-end window-basedcongestion control,” IEEE/ACM Transactions on Networking, vol. 8,no. 5, pp. 556–567, October 2000.

[26] Steven H. Low, Larry Peterson, and Limin Wang, “Understanding Vegas:a duality model,” J. of ACM, vol. 49, no. 2, pp. 207–235, March 2002,http://netlab.caltech.edu.

[27] Frank P. Kelly, Aman Maulloo, and David Tan, “Rate control forcommunication networks: Shadow prices, proportional fairness andstability,” Journal of Operations Research Society, vol. 49, no. 3, pp.237–252, March 1998.

[28] Frank P. Kelly, “Mathematical modelling of the Internet,” in Mathemat-ics Unlimited - 2001 and Beyond, B. Engquist and W. Schmid, Eds., pp.685–702. Springer-Verlag, Berlin, 2001.

[29] C.V. Hollot, V. Misra, D. Towsley, and W.B. Gong, “Analysis anddesign of controllers for AQM routers supporting TCP flows,” IEEETransactions on Automatic Control, vol. 47, no. 6, pp. 945–959, 2002.

[30] S. H. Low, F. Paganini, J. Wang, and J. C. Doyle, “Linear stability ofTCP/RED and a scalable control,” Computer Networks Journal, vol. 43,no. 5, pp. 633–647, 2003, http://netlab.caltech.edu.

[31] Matthew Mathis, Jeffrey Semke, Jamshid Mahdavi, and Teunis Ott,“The macroscopic behavior of the TCP congestion avoidance algo-rithm,” ACM Computer Communication Review, vol. 27, no. 3, July1997, http://www.psc.edu/networking/papers/model_ccr97.ps.

[32] T. V. Lakshman and Upamanyu Madhow, “The performance of TCP/IPfor networks with high bandwidth–delay products and random loss,”IEEE/ACM Transactions on Networking, vol. 5, no. 3, pp. 336–350,June 1997, http://www.ece.ucsb.edu/Faculty/Madhow/Publications/ton97.ps.

[33] C. Jin, D. Wei, and S. H. Low, “FAST TCP: motivation, architecture,algorithms, performance,” Tech. Rep. CaltechCSTR:2003.010, Caltech,Pasadena CA, 2003, http://netlab.caltech.edu/FAST.

[34] S. Floyd, M. Handley, J. Padhye, and J. Widmer, “Equation-based con-gestion control for unicast applications,” in Proc. ACM SIGCOMM’00,September 2000.

[35] Steven H. Low and David E. Lapsley, “Optimization flow control,I: basic algorithm and convergence,” IEEE/ACM Transactions onNetworking, vol. 7, no. 6, pp. 861–874, December 1999, http://netlab.caltech.edu.

[36] S. Kunniyur and R. Srikant, “End-to-end congestion control: utilityfunctions, random losses and ECN marks,” IEEE/ACM Transactions onNetworking, 2003.

[37] L. Massoulie and J. Roberts, “Bandwidth sharing: objectives andalgorithms,” IEEE/ACM Transactions on Networking, vol. 10, no. 3,pp. 320–328, June 2002.

[38] H. Yaiche, R. R. Mazumdar, and C. Rosenberg, “A game theoreticframework for bandwidth allocation and pricing in broadband networks,”IEEE/ACM Transactions on Networking, vol. 8, no. 5, October 2000.

[39] Steven H. Low, “A duality model of TCP and queue managementalgorithms,” IEEE/ACM Trans. on Networking, vol. 11, no. 4, pp. 525–536, August 2003, http://netlab.caltech.edu.

[40] Glenn Vinnicombe, “On the stability of networks operating TCP-likecongestion control,” in Proc. of IFAC World Congress, 2002.

[41] S. Kunniyur and R. Srikant, “A time-scale decomposition approach toadaptive ECN marking,” IEEE Transactions on Automatic Control, June2002.

[42] S. Kunniyur and R. Srikant, “Designing AVQ parameters for a generaltopology network,” in Proceedings of the Asian Control Conference,September 2002.

[43] Glenn Vinnicombe, “Robust congestion control for the Internet,”submitted for publication, 2002.

[44] Steven H. Low, Fernando Paganini, and John C. Doyle, “Internetcongestion control,” IEEE Control Systems Magazine, vol. 22, no. 1,pp. 28–43, February 2002.

[45] S. H. Low and R. Srikant, “A mathematical framework for designing alow-loss, low-delay internet,” 2003.

[46] Frank P. Kelly, “Fairness and stability of end-to-end congestion control,”European Journal of Control, vol. 9, pp. 159–176, 2003.

[47] C. Jin, D. Wei, S. H. Low, G. Buhrmaster, J. Bunn, D. H. Choe, R. L. A.Cottrell, J. C. Doyle, H. Newman, F. Paganini, S. Ravot, and S. Singh,“FAST Kernel: Background theory and experimental results,” in FirstInternational Workshop on Protocols for Fast Long-Distance Networks,February 2003.

[48] David X. Wei and Steven H. Low, “A model for TCP model withburstiness effect,” Submitted for publication, 2003.

[49] Zhikui Wang and Fernando Paganini, “Global stability with time delayin network congestion control,” in Proc. of the IEEE Conference onDecision and Control, December 2002.

[50] S. Deb and R. Srikant, “Global stability of congestion controllers forthe Internet,” IEEE Transactions on Automatic Control, vol. 48, no. 6,pp. 1055–1060, June 2003.

[51] Luigi Rizzo, “Dummynet,” http://http://info.iet.unipi.it/˜luigi/ip_dummynet/.

[52] Y. Li, “Implementing highspeed tcp,” URL:http://www.hep.ucl.ac.uk/˜ytl/tcpip/hstcp/index.html.

[53] T. Dunigan, “Floyd’s tcp slow-start and aimd mods,” URL:http://www.csm.ornl.gov/˜dunigan/net100/floyd.html.

[54] D. Chiu and R. Jain, “Analysis of the increase and decrease algorithmsfor congestion avoidance in computer networks,” Computer Networks,vol. 17, pp. 1–14, 1989.

[55] R. Jain, The art of computer systems performance analysis: techniquesfor experimental design, measurement, simulation and modeling., JohnWiley and Sons, Inc., 1991.


fast tcp: motivation, architecture, algorithms,...

Documents