tcp variations naveen manicka cisc 856 – fall 2005 computer & information sciences university...

35
TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J. Leighton, B. Forouzan, P. Amer., I. Aydin

Post on 19-Dec-2015

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

TCP Variations

Naveen ManickaCISC 856 – Fall 2005

Computer & Information Sciences

University of DelawareNov 10, 2005

Most slides are borrowed from J. Leighton, B. Forouzan, P. Amer., I. Aydin

Page 2: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

What Are TCP Variations?

• Implementations of TCP that use different algorithms to achieve end-to-end congestion control.– Tahoe– Reno– NewReno– Vegas– SACK– Rome– Paris

Page 3: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

Evolution of TCP

1985 1990

1986Congestion

collapse 1st observed

1988Van Jacobson’s

algorithmsslow start, congestion

avoidance, fast retransmit (all

implemented in 4.3BSD Tahoe)SIGCOMM 88

19904.3BSD Renofast recovery

delayed ACK’s

1984Nagel’s algorithm

to reduce overhead

of small packets;predicts

congestion collapse

1993 1996

1996NewReno modified

fast recoverySACK TCP

Selective Ack(Floyd et al)

1993TCP Vegas(not implemented)real congestion

avoidance (Brakmo et al)

Page 4: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

How Did TCP Cause Congestion?

(Original Recipe TCP)

• Poor Efficiency• In telnet-like applications, TCP sends 1 byte of

data with 4000% overhead.

• Sending too much, too soon• Unnecessary retransmits• Sending window too large• Very little change in behavior due to congestion

Page 5: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

TCP Variation: TCP Tahoe

• 1st improvement was TCP Tahoe (1988)

– Adjusts sending window as congestion increases or decreases (AIMD congestion avoidance & slow-start)

– Improved retransmission policy (Fast Retransmit)

– Nagle’s algorithm– Improved RTO calculation and back-off (Karn’s

algorithm)

Page 6: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

Self-clocking or ACK Clock

• Maintain equilibrium of system• Self-clocking systems tend to be very stable

under a wide range of bandwidths and delays.

• The principal issue with self-clocking systems is getting them started.

PrPb

Ar

Ab

ReceiverSender

As

Page 7: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

7

TCP Tahoe Window Control

• TCP sender maintains two new variables: cwnd – congestion window

cwnd is inferred from the level of congestion in the network.

ssthresh – slow-start threshold ssthresh can be thought of as an estimate of the level below which congestion is not expected.

• send_win = min (rwin, cwnd)

Page 8: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

Slow Start Phase(cwnd < ssthresh)

• Initially:– cwnd = 1*MSS (Maximum Segment Size)– ssthresh is very large.

• If no loss:– cwnd += 1*MSS (after each new ACK) – (This gives exponential growth of cwnd)

• If loss (timeout):– ssthresh = max( flight size/2, 2*MSS)– cwnd = 1*MSS

Page 9: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

Congestion Avoidance Phase(cwnd > ssthresh)

• If no loss:– increase cwnd at most 1*MSS per RTT (additive

increase) – cwnd += ( MSS*MSS / cwnd ) on every ACK

(approximation to increasing cwnd by 1*MSS per RTT)

• If loss:– ssthresh = max ( flight size/2, 2*MSS ) (multiplicative

decrease)– cwnd = 1*MSS.

Page 10: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

Slow Start & Congestion Avoidance

ssthresh

• Initally:

- cwnd = 1*MSS

- ssthresh = very high (65535)

• If a new ACK comes:

- if cwnd < ssthresh update cwnd according to slow start

- if cwnd > ssthresh update cwnd according to congestion avoidance

- If cwnd = ssthresh either

• If timeout (i.e. loss) :

- ssthresh = flight size/2;

- cwnd = 1*MSS

time

cwnd

Loss, e.g. timeout

slow start – in green

congestion avoidance – in blue

(initial) ssthresh

Page 11: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

assume ssthresh = 8*MSS

Example: Slow Start/Congestion Avoidance

cwnd = 10

cwnd = 4

Eight ACKs

cwnd = 2

cnwd = 8

cwnd = 1

cwnd = 9

Eight TCP-PDUs

nineACKs

nine TCP-PDUs

ten ACKs

ten TCP-PDUs

cwnd = 11

0

2

4

6

8

10

12

1 2 3 4 5 6 7

transmission number

con

ges

tio

n w

ind

ow

siz

e (i

n M

SS

)

ssthresh

S R

Page 12: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

TCP Tahoe’s Retransmission Policy

• When a segment is lost, original TCP waits for an ACK that’s not coming and eventually times-out.

• Often, many, if not all, of the segments sent after the lost segment arrive at the receiver.

• For each segment received, the receiver sends a duplicate ACK, notifying the sender that the receiver is waiting for the missing segment.

• TCP Tahoe interprets duplicate ACK’s as an indication that a segment was lost.

Page 13: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

TCP Tahoe’s Fast Retransmit1. Sender receives

3 dupACKS.2. Sender infers

that the segment is lost.

3. Sender re-sends the segment immediately!

4. Sender returns to slow-start.

ACK 1

segment 1cwnd = 1

cwnd = 2 segment 2segment 3

ACK 3

cwnd = 4 segment 4segment 5segment 6segment 7

ACK 2

3 duplicateACKs ACK 3

ACK 3

ACK 3

segment 4

fast-retransmit

of segment 4

S R

Page 14: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

380000

400000

420000

440000

460000

480000

500000

4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7

Time (s)

Se

qu

en

ce

#

0

4

8

12

16

20

24

28

32

36

40

44

48

MS

S

Sent SegmentACK'ed Segmentcwndssthresh

TCP Tahoe Trace (with one dropped segment)

Lost segment

Fast Retransmit

Begin slow-start

Begin congestion avoidance

RTT

Page 15: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

Could Tahoe Do Better?

• Receipt of dupACKs tells the sender that the receiver is still getting new segments, i.e. there is still data flowing between sender and receiver

• Why does sender go back to slow start after fast retransmit?

• Why does sender let Ack clock die?

Page 16: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

TCP Variation: TCP Reno

• 2nd Improvement was TCP Reno (1990)– From Tahoe:

• Nagle’s algorithm

• Improved RTO calculation and back-off

• AIMD congestion avoidance with slow-start

• Fast retransmit

– New to Reno:• Fast recovery

Page 17: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

Fast Recovery

cwnd

Slow Start Congestion AvoidanceTime

“inflating” cwnd with dupACKs “deflating” cwnd with a new ACK

(initial) ssthresh

new ACK

fast-retransmitfast-retransmit

new ACK

timeout

Concept:• After fast retransmit,

reduce cwnd by half, and continue sending segments at this reduced level.

Observations:• Receiver is still getting

T-PDUs. There can’t be overwhelming congestion.

• How does sender transmit T-PDUs on a dupACK? Need to use a “trick” - inflate cwnd.

Page 18: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

• After receiving 3 dupACKS:– Retransmit the lost segment.– Set ssthresh = flight size/2.– Set ndupacks=3 and cwnd=ssthresh + ndupacks. --- (inflating)

In Reno: send_win = min ( rwnd, cwnd + ndupacks ).

• If dupACK arrives:– cwnd =+ 1MSS --- (inflating)– Transmit new segment, if allowed.

• If new ACK arrives:– ndupacks = 0– cwnd = initial ssthresh in (2) --- (deflating)– Exit fast recovery.

• If RTO timer expires:– ndupacks = 0– Perform slow-start -- (ssthresh = flight size/2, cwnd = 1 * MSS)

Fast Retransmit & Fast Recovery

Page 19: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

TCP Reno Trace (with one dropped segment)

380000

400000

420000

440000

460000

480000

500000

4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7

Time (s)

Se

qu

en

ce

#

0

4

8

12

16

20

24

28

32

36

40

44

48

MS

S

Sent Segment

ACK'ed Segment

cwnd

ssthresh

cwnd+ndupacks

Lost segment

Fast Retransmit

Begin fast recovery

Begin congestion avoidance

RTTExit fast recovery

Page 20: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

TCP Tahoe & Reno Trace (with one dropped segment)

380000

400000

420000

440000

460000

480000

500000

4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7

Time(s)

Se

qu

en

ce

#

Tahoe

Reno

Tahoe & Reno

Slow S

tart

Congestion Avoidance

Page 21: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

What if There are Multiple Losses in a Window?

• With two losses in a window, Reno will occasionally timeout.

• With three losses in a window, Reno will usually timeout.

• With four losses in a window, Reno is guaranteed to timeout!

• With three or more losses in a window, Tahoe typically out performs Reno!

Page 22: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

TCP Reno Trace (with two dropped segments)

380000

400000

420000

440000

460000

480000

500000

4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7

Time (s)

Se

qu

en

ce

#

0

4

8

12

16

20

24

28

32

36

40

44

48

MS

S

Sent Segment

ACK'ed Segment

cwnd

ssthresh

cwnd+ndupacks

Fast Retransmit 1

Fast Retransmit 2

Begin fast recovery 1

Begin fast recovery 2

Page 23: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

TCP Variation: TCP NewReno

• 3rd Improvement was TCP NewReno (1995)– From Tahoe:

• Nagle’s algorithm• Improved RTO calculation and back-off• AIMD congestion avoidance with slow-start

– New to NewReno:• Fast retransmit & modified fast recovery

Page 24: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

Modifications to Fast Recovery

– Partial ACKs: An ACK that acknowledges some but not all the segments that were outstanding at the start of fast recovery. NewReno interprets this as an indication of multiple loss.

– If partial ACK received, re-transmit the next lost segment immediately and set ndupacks = 0 (deflate send_win).

– Sender remains in fast recovery until all data outstanding when fast recovery was initiated is ACK’ed. Additional dupACK’s increase ndupacks.

Page 25: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

TCP NewReno Trace (with two dropped segments)

380000

400000

420000

440000

460000

480000

500000

4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7

Time(s)

Se

qu

en

ce

#

0

4

8

12

16

20

24

28

32

36

40

44

48

MS

S

Sent Segment

ACK'ed Segment

cwnd

ssthresh

cwnd+ndupacks

Fast retransmit of lost segment

Modified fast recovery

Exit fast recovery

Partial Ack

Outstanding Data Ack

Page 26: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

Tahoe, Reno & NewReno Trace(with two dropped segments)

380000

400000

420000

440000

460000

480000

500000

4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7

Time (s)

Se

qu

en

ce

#

NewReno

Reno

Tahoe

Reno & NewReno

All

Page 27: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

State Transitions for Tahoe, Reno & New Reno

Page 28: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

Is There a Better Way?• The only way Tahoe, Reno and NewReno

can detect congestion is by creating congestion!– They carefully probe for congestion by slowly

increasing their sending rate.– When they find (create), congestion, they cut

sending rate at least in half!

• This slow advance and rapid retreat approach results in a saw-toothed sending rate and highly erratic throughput.

• What if TCP could detect congestion without causing congestion?

Page 29: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

TCP Variation: TCP Vegas(True Congestion Avoidance)

• Introduced by Brakmo and Peterson (1994)• Three changes to TCP Reno

– Modified congestion avoidance• Don’t wait for a timeout, if actual throughput < expected throughput

decrease the congestion window. (AIAD!)• Estimate of expected throughput,

– Texpected = window size / smallest measured RTT

– New retransmission mechanism• motivation: what if sender never receives 3-dupACKs (due to lost

segments or window size is too small.)• mechanism: sender does retransmission after a dupACK received, if

RTT estimate > timeout.– Modified slow start

• motivation: sender tries finding correct window size without causing a loss.

• mechanism: exponential cwnd growth only every other RTT.

Page 30: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

TCP Variation: TCP Vegas

• Congestion Avoidance:– 2 thresholds α and β, to control amount of extra data i.e

Textra = Texpected – Tactual

• Textra < α => Window size increased by 1.

• α < Textra < β => No change in window size.

• Textra > β => Window size decreased by 1.

– Avoids large oscillations like in other variations.

• More balanced throughput

Page 31: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

Vegas vs. NewReno

TCP NewReno throughput with simulated background traffic

TCP Vegas throughput with simulated background traffic

Source: Brakmo and Peterson, TCP Vegas: End to End Congestion Avoidance on a Global Internet, IEEE JSAC, Vol 13, No. 8, Oct. 1995, pp. 1465 – 1480

Page 32: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

What Variations Are Being Used?

• Experimental results obtained by testing 84394 web servers (27914 classified):– NewReno 76%– Tahoe 4% (w/o Fast

Retransmit)– Reno 15%– Other 1%– Tahoe 4%

Source: Medina, Allman, and Floyd, “Measuring the Evolution of Transport Protocols in the Internet”, May 2004

Page 33: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

TCP Today

• TCP is currently defined by:– IETF Std’s.: RFC793, RFC1122 (Tahoe w/o FR)– IETF Proposed Std’s.:

• RFC1323 (Scaled windows & timestamps)• RFC2018, RFC2883, RFC3517 (SACK)• RFC2581 (Reno)• RFC2988 (RTO)• RFC3168 (ECN)• RFC3390 (Larger IW)

– IETF Exp. RFC’s:• RFC2582 (NewReno)• Many many more!

Page 34: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

Questions ? …

Page 35: TCP Variations Naveen Manicka CISC 856 – Fall 2005 Computer & Information Sciences University of Delaware Nov 10, 2005 Most slides are borrowed from J

Summary of TCP Behavior

• When entering slow start, if connection is new,ssthresh = arbitrarily large value

cwnd = 1.else,

ssthresh = max(flight size/2, 2*MSS)cwnd = 1.

• In slow start ++cwnd on new ACK

TCP Variation

Response to 3 dupACK’s

Response to Partial ACK of Fast Retransmission

Response to “full” ACK of Fast Retransmission

TahoeDo fast retransmit,

enter slow start++cwnd ++cwnd

RenoDo fast retransmit,enter fast recovery

Exit fast recovery, deflate window, enter congestion

avoidance

Exit fast recovery, deflate window, enter congestion

avoidance

NewRenoDo fast retransmit,enter modified fast

recovery

Fast retransmit and deflate window – remain in

modified fast recovery

Exit modified fast recovery, deflate window, enter congestion avoidance

• When entering either fast recovery or modified fast recovery,

ssthresh = max(flight size/2, 2*MSS)cwnd = ssthresh.

• In congestion avoidancecwnd += 1*MSS per RTT