compsci514: computer networks lecture 4: end-to-end ... · the tcp congestion collapse incident...

38
1 CompSci 514: Computer Networks Lecture 4: End-to-end Congestion Control Xiaowei Yang

Upload: others

Post on 25-Mar-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

1

CompSci 514: Computer NetworksLecture 4: End-to-end Congestion

Control

Xiaowei Yang

Page 2: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

2

Outline• TCP congestion collapse

• TCP congestion control algorithm

• Analysis

• Throughput and loss rate modeling

Page 3: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

3

A fundamental networking problem

• Packet switching– Statistical multiplexing

• Q: N users, and one network– How fast should each user send?– Or when should a packet be sent?– Why is it a difficult problem?

...

Page 4: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

• A à C, B à D• How fast should A and B send?• Don’t know the number of users• Don’t know the bottleneck capacity

4

A

B

C

D1Gbps

1Gbps

10Mbps

1Gbps

1Mbps

Page 5: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

The TCP Congestion Collapse Incident

• In October of 1986, the Internet had the first of what became a series of �congestion collapse�: increased load leads to decreased throughput

• The NSFnet phase-I backbone dropped three orders of magnitude from its capacity of 32 kbit/s to 40 bit/s, and continued until end nodes started implementing Van Jacobson's congestion control between 1987 and 1988.

5

Page 6: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

What caused the TCP congestion collapse?

• Original TCP (Cerf74)– Static window W– W = # of unacknowledged bytes

6

Send 1, 2, 3, …, W bytes

ACK 1, 2, 3, …, WSend W+1, W+2, … 2W bytes

ACK W+1, W+2, … 2W bytes

Page 7: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Review of TCP’s sliding window algorithm

• A well-known algorithm in networking• Used for

– Reliable transmission– Flow control– Congestion control

7

Page 8: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Definitions

• Round Trip Time (RTT)– The time it takes from sending a packet to

receiving an ACK for that packet– RTTmin = minimum RTT observed during a

connection

• Bandwidth Delay Product (BDP)– Bottleneck Capacity * RTTmin

8

Page 9: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Drawbacks of static window sizes

• One TCP flow– MSS = 512 bytes, Window size = 10 MSS,

RTTmin = 100ms• 10 TCP flows, 100 TCP flows, …

9

Page 10: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

What caused the TCP congestion collapse?

• Original TCP (Cerf74)– Static window W– W = # of unacknowledged bytes

• When user increases, queueing delay increases• TCP times out, retransmitting the whole window• TCP retransmits too early, wasting the

network’s bandwidth to retransmit the same packets already in transit and reducing useful throughput (goodput)

10

Page 11: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Retransmitting packets that are not lost à congestion collapse

11

Page 12: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

How can we prevent this problem?

12

Page 13: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Design Goals• Congestion avoidance: making the

system operate around the knee to obtain low latency and high throughput

• Congestion control: making the system operate left to the cliff to avoid congestion collapse

• Congestion control: making the system operate left to the cliff to avoid congestion collapse

• Congestion avoidance: making the system operate around the knee to obtain low latency and high throughput

Page 14: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Design goals

• More concretely– Efficiency

• High throughput/utilization– Low latency– Fairness– Fast convergence

14

Page 15: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Key Improvements from the TCP88 paper

• Revised RTO computation– Prevent spurious retransmissions

• ACK self-clocking– Determines when to send a packet

• Dynamic window sizing– Determines how fast a sender can send a

packet

Page 16: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Revised RTO estimates

• Old design: RTTn+1 = a RTT + (1- a ) RTTnRTO = β RTTn+1

• Improved designsrttn+1 = a RTT + (1- a ) srttnrttvarn+1 = b ( | RTT – srttn | ) + (1- b )rttvarn

RTOn+1 = srttn+1 + 4 rttvarn+1

16

Page 17: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Exponential backoff• When a timeout occurs , the RTO value is

doubledRTOn+1 = max (2*RTOn , 64) seconds

• Can save the day in the worst case

17

Page 18: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

ACK self-clocking

• When pipe is full, the speed of ACK returns equals to the speed new packets should be injected into the network

Page 19: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Why is self-clocking important

• Determines when a packet should be sent– A flow should send no faster than what its

bottleneck allows– ACK spacing correlates with bottleneck

capacity

27

Page 20: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Does ACK clocking solve it all?• No

• A sender will not send faster, but may under-utilize bandwidth or cause packets being buffered at the bottleneck

28

Page 21: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Dynamic window sizing• Sending speed: W / RTT

• Ideally, W / RTT = available bandwidth at the bottleneck

• à Adjusting W based on available bandwidth

1. Increases W when there is no congestion2. Decreases W when there is congestion

Page 22: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

How to detect congestion?

• Packet loss– Time out– Duplicate ACKs

• Latency– Current RTT – RTTmin

• Sending rate / ACKing rate

• ML? 31

Page 23: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

TCP Reno: Two Modes of Congestion Control

1. Probing for the available bandwidth– slow start (W < ssthresh)

2. Avoid overloading the network– congestion avoidance (W >= ssthresh)

Page 24: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Slow Start• Initial value: Set W = 1 MSS

• Modern TCP implementation may set initial window size to a much larger value

• When receiving an ACK, W+= 1 MSS

• W doubles every RTT– Exponential increase

Page 25: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Congestion Avoidance

• If W >= ssthresh then each time an ACK is received, increases W as follows:

• W += 1 / W

• W increases by one MSS per RTT

Page 26: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Reaction to Congestion

• Reduce W

• Timeout: severe congestion– W is reset to one MSS:– ssthresh is set to half of the current size of the

congestion window:ssthressh = W / 2

– entering slow-start

Page 27: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Reaction to Congestion• Duplicate ACKs: not so congested (why?)• Fast retransmit

– Three duplicate ACKs indicate a packet loss

– Retransmit without timeout

Page 28: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Reaction to congestion: Fast

Recovery

• Avoiding slow start (changed from TCP88)

– ssthresh = W / 2

– W = W +3MSS

– Increase W by one MSS for each additional duplicate ACK

• When ACK arrives that acknowledges “new data,” set:

W = ssthresh = W/2

enter congestion avoidance, i.e., increase W by 1 MSS per RTT

Page 29: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

42

The Sawtooth behavior of TCP

• For every ACK received– W += 1/W

• For every packet lost– W /= 2

• Only true in unit of RTT

RTT

W

Page 30: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

43

Why does it work? [Chiu-Jain]

• A feedback control system• The network uses feedback y to adjust users�

load åx_i

Page 31: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

44

Goals of Congestion Avoidance

– Efficiency: the closeness of the total load on the resource of its knee: åx_i ~= capacity

– Fairness:

• When all x_is are equal, F(x) = 1• When all x_i�s are zero but x_j = 1, F(x) = 1/n

– Distributedness• A centralized scheme requires complete knowledge of the state of

the system

– Convergence• The system approach the goal state from any starting state

Page 32: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

45

Metrics to measure convergence

• Responsiveness• Smoothness

Page 33: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

46

Model the system as a linear control system

• Four sample types of controls• AIAD, AIMD, MIAD, MIMD

Page 34: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

47

Phase space

X1=W1/RTT

X2=W2/RTT

Page 35: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

48

TCP congestion control is AIMD

• Problems:– Each source has to probe for its bandwidth– Congestion occurs first before TCP backs off– Unfair: long RTT flows obtain smaller bandwidth

shares

RTT

W

Page 36: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

49

Macroscopic behavior of TCP

pRTTMSS••5.1

• Throughput is inversely proportional to RTT:

• In a steady state, total packets sent in one sawtooth cycle:– S = w + (w+1) + … (w+w) = 3/2 w2

• the maximum window size is determined by the loss rate– 1/S = p– w =

• The length of one cycle: w * RTT• Average throughput: 3/2 w * MSS / RTT

11.5p

Page 37: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

Why is congestion control still relevant

• The Mice & Elephant phenomenon of Internet traffic– Many small flows– But a few large flows sent most of the byte

50

Page 38: CompSci514: Computer Networks Lecture 4: End-to-end ... · The TCP Congestion Collapse Incident •In October of 1986, the Internet had the first of what became a series of congestion

52

Conclusion

• Congestion control is one of the fundamental issues in networking

• TCP congestion control algorithm• The AIMD algorithm• TCP macroscopic behavior model