TCP Variations
Naveen ManickaCISC 856 – Fall 2005
Computer & Information Sciences
University of DelawareNov 10, 2005
Most slides are borrowed from J. Leighton, B. Forouzan, P. Amer., I. Aydin
What Are TCP Variations?
• Implementations of TCP that use different algorithms to achieve end-to-end congestion control.– Tahoe– Reno– NewReno– Vegas– SACK– Rome– Paris
Evolution of TCP
1985 1990
1986Congestion
collapse 1st observed
1988Van Jacobson’s
algorithmsslow start, congestion
avoidance, fast retransmit (all
implemented in 4.3BSD Tahoe)SIGCOMM 88
19904.3BSD Renofast recovery
delayed ACK’s
1984Nagel’s algorithm
to reduce overhead
of small packets;predicts
congestion collapse
1993 1996
1996NewReno modified
fast recoverySACK TCP
Selective Ack(Floyd et al)
1993TCP Vegas(not implemented)real congestion
avoidance (Brakmo et al)
How Did TCP Cause Congestion?
(Original Recipe TCP)
• Poor Efficiency• In telnet-like applications, TCP sends 1 byte of
data with 4000% overhead.
• Sending too much, too soon• Unnecessary retransmits• Sending window too large• Very little change in behavior due to congestion
TCP Variation: TCP Tahoe
• 1st improvement was TCP Tahoe (1988)
– Adjusts sending window as congestion increases or decreases (AIMD congestion avoidance & slow-start)
– Improved retransmission policy (Fast Retransmit)
– Nagle’s algorithm– Improved RTO calculation and back-off (Karn’s
algorithm)
Self-clocking or ACK Clock
• Maintain equilibrium of system• Self-clocking systems tend to be very stable
under a wide range of bandwidths and delays.
• The principal issue with self-clocking systems is getting them started.
PrPb
Ar
Ab
ReceiverSender
As
7
TCP Tahoe Window Control
• TCP sender maintains two new variables: cwnd – congestion window
cwnd is inferred from the level of congestion in the network.
ssthresh – slow-start threshold ssthresh can be thought of as an estimate of the level below which congestion is not expected.
• send_win = min (rwin, cwnd)
Slow Start Phase(cwnd < ssthresh)
• Initially:– cwnd = 1*MSS (Maximum Segment Size)– ssthresh is very large.
• If no loss:– cwnd += 1*MSS (after each new ACK) – (This gives exponential growth of cwnd)
• If loss (timeout):– ssthresh = max( flight size/2, 2*MSS)– cwnd = 1*MSS
Congestion Avoidance Phase(cwnd > ssthresh)
• If no loss:– increase cwnd at most 1*MSS per RTT (additive
increase) – cwnd += ( MSS*MSS / cwnd ) on every ACK
(approximation to increasing cwnd by 1*MSS per RTT)
• If loss:– ssthresh = max ( flight size/2, 2*MSS ) (multiplicative
decrease)– cwnd = 1*MSS.
Slow Start & Congestion Avoidance
ssthresh
• Initally:
- cwnd = 1*MSS
- ssthresh = very high (65535)
• If a new ACK comes:
- if cwnd < ssthresh update cwnd according to slow start
- if cwnd > ssthresh update cwnd according to congestion avoidance
- If cwnd = ssthresh either
• If timeout (i.e. loss) :
- ssthresh = flight size/2;
- cwnd = 1*MSS
time
cwnd
Loss, e.g. timeout
slow start – in green
congestion avoidance – in blue
(initial) ssthresh
assume ssthresh = 8*MSS
Example: Slow Start/Congestion Avoidance
cwnd = 10
cwnd = 4
Eight ACKs
cwnd = 2
cnwd = 8
cwnd = 1
cwnd = 9
Eight TCP-PDUs
nineACKs
nine TCP-PDUs
ten ACKs
ten TCP-PDUs
cwnd = 11
0
2
4
6
8
10
12
1 2 3 4 5 6 7
transmission number
con
ges
tio
n w
ind
ow
siz
e (i
n M
SS
)
ssthresh
S R
TCP Tahoe’s Retransmission Policy
• When a segment is lost, original TCP waits for an ACK that’s not coming and eventually times-out.
• Often, many, if not all, of the segments sent after the lost segment arrive at the receiver.
• For each segment received, the receiver sends a duplicate ACK, notifying the sender that the receiver is waiting for the missing segment.
• TCP Tahoe interprets duplicate ACK’s as an indication that a segment was lost.
TCP Tahoe’s Fast Retransmit1. Sender receives
3 dupACKS.2. Sender infers
that the segment is lost.
3. Sender re-sends the segment immediately!
4. Sender returns to slow-start.
ACK 1
segment 1cwnd = 1
cwnd = 2 segment 2segment 3
ACK 3
cwnd = 4 segment 4segment 5segment 6segment 7
ACK 2
3 duplicateACKs ACK 3
ACK 3
ACK 3
segment 4
fast-retransmit
of segment 4
S R
380000
400000
420000
440000
460000
480000
500000
4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7
Time (s)
Se
qu
en
ce
#
0
4
8
12
16
20
24
28
32
36
40
44
48
MS
S
Sent SegmentACK'ed Segmentcwndssthresh
TCP Tahoe Trace (with one dropped segment)
Lost segment
Fast Retransmit
Begin slow-start
Begin congestion avoidance
RTT
Could Tahoe Do Better?
• Receipt of dupACKs tells the sender that the receiver is still getting new segments, i.e. there is still data flowing between sender and receiver
• Why does sender go back to slow start after fast retransmit?
• Why does sender let Ack clock die?
TCP Variation: TCP Reno
• 2nd Improvement was TCP Reno (1990)– From Tahoe:
• Nagle’s algorithm
• Improved RTO calculation and back-off
• AIMD congestion avoidance with slow-start
• Fast retransmit
– New to Reno:• Fast recovery
Fast Recovery
cwnd
Slow Start Congestion AvoidanceTime
“inflating” cwnd with dupACKs “deflating” cwnd with a new ACK
(initial) ssthresh
new ACK
fast-retransmitfast-retransmit
new ACK
timeout
Concept:• After fast retransmit,
reduce cwnd by half, and continue sending segments at this reduced level.
Observations:• Receiver is still getting
T-PDUs. There can’t be overwhelming congestion.
• How does sender transmit T-PDUs on a dupACK? Need to use a “trick” - inflate cwnd.
• After receiving 3 dupACKS:– Retransmit the lost segment.– Set ssthresh = flight size/2.– Set ndupacks=3 and cwnd=ssthresh + ndupacks. --- (inflating)
In Reno: send_win = min ( rwnd, cwnd + ndupacks ).
• If dupACK arrives:– cwnd =+ 1MSS --- (inflating)– Transmit new segment, if allowed.
• If new ACK arrives:– ndupacks = 0– cwnd = initial ssthresh in (2) --- (deflating)– Exit fast recovery.
• If RTO timer expires:– ndupacks = 0– Perform slow-start -- (ssthresh = flight size/2, cwnd = 1 * MSS)
Fast Retransmit & Fast Recovery
TCP Reno Trace (with one dropped segment)
380000
400000
420000
440000
460000
480000
500000
4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7
Time (s)
Se
qu
en
ce
#
0
4
8
12
16
20
24
28
32
36
40
44
48
MS
S
Sent Segment
ACK'ed Segment
cwnd
ssthresh
cwnd+ndupacks
Lost segment
Fast Retransmit
Begin fast recovery
Begin congestion avoidance
RTTExit fast recovery
TCP Tahoe & Reno Trace (with one dropped segment)
380000
400000
420000
440000
460000
480000
500000
4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7
Time(s)
Se
qu
en
ce
#
Tahoe
Reno
Tahoe & Reno
Slow S
tart
Congestion Avoidance
What if There are Multiple Losses in a Window?
• With two losses in a window, Reno will occasionally timeout.
• With three losses in a window, Reno will usually timeout.
• With four losses in a window, Reno is guaranteed to timeout!
• With three or more losses in a window, Tahoe typically out performs Reno!
TCP Reno Trace (with two dropped segments)
380000
400000
420000
440000
460000
480000
500000
4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7
Time (s)
Se
qu
en
ce
#
0
4
8
12
16
20
24
28
32
36
40
44
48
MS
S
Sent Segment
ACK'ed Segment
cwnd
ssthresh
cwnd+ndupacks
Fast Retransmit 1
Fast Retransmit 2
Begin fast recovery 1
Begin fast recovery 2
TCP Variation: TCP NewReno
• 3rd Improvement was TCP NewReno (1995)– From Tahoe:
• Nagle’s algorithm• Improved RTO calculation and back-off• AIMD congestion avoidance with slow-start
– New to NewReno:• Fast retransmit & modified fast recovery
Modifications to Fast Recovery
– Partial ACKs: An ACK that acknowledges some but not all the segments that were outstanding at the start of fast recovery. NewReno interprets this as an indication of multiple loss.
– If partial ACK received, re-transmit the next lost segment immediately and set ndupacks = 0 (deflate send_win).
– Sender remains in fast recovery until all data outstanding when fast recovery was initiated is ACK’ed. Additional dupACK’s increase ndupacks.
TCP NewReno Trace (with two dropped segments)
380000
400000
420000
440000
460000
480000
500000
4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7
Time(s)
Se
qu
en
ce
#
0
4
8
12
16
20
24
28
32
36
40
44
48
MS
S
Sent Segment
ACK'ed Segment
cwnd
ssthresh
cwnd+ndupacks
Fast retransmit of lost segment
Modified fast recovery
Exit fast recovery
Partial Ack
Outstanding Data Ack
Tahoe, Reno & NewReno Trace(with two dropped segments)
380000
400000
420000
440000
460000
480000
500000
4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7
Time (s)
Se
qu
en
ce
#
NewReno
Reno
Tahoe
Reno & NewReno
All
State Transitions for Tahoe, Reno & New Reno
Is There a Better Way?• The only way Tahoe, Reno and NewReno
can detect congestion is by creating congestion!– They carefully probe for congestion by slowly
increasing their sending rate.– When they find (create), congestion, they cut
sending rate at least in half!
• This slow advance and rapid retreat approach results in a saw-toothed sending rate and highly erratic throughput.
• What if TCP could detect congestion without causing congestion?
TCP Variation: TCP Vegas(True Congestion Avoidance)
• Introduced by Brakmo and Peterson (1994)• Three changes to TCP Reno
– Modified congestion avoidance• Don’t wait for a timeout, if actual throughput < expected throughput
decrease the congestion window. (AIAD!)• Estimate of expected throughput,
– Texpected = window size / smallest measured RTT
– New retransmission mechanism• motivation: what if sender never receives 3-dupACKs (due to lost
segments or window size is too small.)• mechanism: sender does retransmission after a dupACK received, if
RTT estimate > timeout.– Modified slow start
• motivation: sender tries finding correct window size without causing a loss.
• mechanism: exponential cwnd growth only every other RTT.
TCP Variation: TCP Vegas
• Congestion Avoidance:– 2 thresholds α and β, to control amount of extra data i.e
Textra = Texpected – Tactual
• Textra < α => Window size increased by 1.
• α < Textra < β => No change in window size.
• Textra > β => Window size decreased by 1.
– Avoids large oscillations like in other variations.
• More balanced throughput
Vegas vs. NewReno
TCP NewReno throughput with simulated background traffic
TCP Vegas throughput with simulated background traffic
Source: Brakmo and Peterson, TCP Vegas: End to End Congestion Avoidance on a Global Internet, IEEE JSAC, Vol 13, No. 8, Oct. 1995, pp. 1465 – 1480
What Variations Are Being Used?
• Experimental results obtained by testing 84394 web servers (27914 classified):– NewReno 76%– Tahoe 4% (w/o Fast
Retransmit)– Reno 15%– Other 1%– Tahoe 4%
Source: Medina, Allman, and Floyd, “Measuring the Evolution of Transport Protocols in the Internet”, May 2004
TCP Today
• TCP is currently defined by:– IETF Std’s.: RFC793, RFC1122 (Tahoe w/o FR)– IETF Proposed Std’s.:
• RFC1323 (Scaled windows & timestamps)• RFC2018, RFC2883, RFC3517 (SACK)• RFC2581 (Reno)• RFC2988 (RTO)• RFC3168 (ECN)• RFC3390 (Larger IW)
– IETF Exp. RFC’s:• RFC2582 (NewReno)• Many many more!
Questions ? …
Summary of TCP Behavior
• When entering slow start, if connection is new,ssthresh = arbitrarily large value
cwnd = 1.else,
ssthresh = max(flight size/2, 2*MSS)cwnd = 1.
• In slow start ++cwnd on new ACK
TCP Variation
Response to 3 dupACK’s
Response to Partial ACK of Fast Retransmission
Response to “full” ACK of Fast Retransmission
TahoeDo fast retransmit,
enter slow start++cwnd ++cwnd
RenoDo fast retransmit,enter fast recovery
Exit fast recovery, deflate window, enter congestion
avoidance
Exit fast recovery, deflate window, enter congestion
avoidance
NewRenoDo fast retransmit,enter modified fast
recovery
Fast retransmit and deflate window – remain in
modified fast recovery
Exit modified fast recovery, deflate window, enter congestion avoidance
• When entering either fast recovery or modified fast recovery,
ssthresh = max(flight size/2, 2*MSS)cwnd = ssthresh.
• In congestion avoidancecwnd += 1*MSS per RTT