congestion control - university of...
TRANSCRIPT
1Mao F04
Congestion Control
EECS 489 Computer Networkshttp://www.eecs.umich.edu/~zmao/eecs489
Z. Morley MaoTuesday Oct 5, 2004
Acknowledgement: Some slides taken from Kurose&Ross and Katz&Stoica
2Mao F04
What We Know
We know:§ How to process packets in a switch§ How to route packets in the network§ How to send packets reliably
We don’t know:§ How fast to send
3Mao F04
What’s at Stake?
§ Send too slow: link is not fully utilized- wastes time
§ Send too fast: link is fully utilized but....- queue builds up in router buffer (delay)- overflow buffers in routers- overflow buffers in receiving host (ignore)
§ Why are buffer overflows a problem?- packet drops (mine and others)- Interesting history....(Van Jacobson rides to the rescue)
4Mao F04
Abstract View
§ We ignore internal structure of router and model it as having a single queue for a particular input-output pair
Sending Host Buffer in Router Receiving Host
A B
5Mao F04
TCP Flow Control
§ receive side of TCP connection has a receive buffer:
§ speed-matching service: matching the send rate to the receiving app’s drain rate
§ app process may be slow at reading from buffer
sender won’t overflowreceiver’s buffer by
transmitting too much,too fast
flow control
6Mao F04
TCP Flow control: how it works
(Suppose TCP receiver discards out-of-order segments)
§ spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead]
§ Rcvr advertises spare room by including value of RcvWindow in segments
§ Sender limits unACKed data to RcvWindow
- guarantees receive buffer doesn’t overflow
7Mao F04
Three Congestion Control Problems
§ Adjusting to bottleneck bandwidth
§ Adjusting to variations in bandwidth
§ Sharing bandwidth between flows
8Mao F04
Approaches towards congestion control
End-end congestion control:§ no explicit feedback from
network§ congestion inferred from end-
system observed loss, delay§ approach taken by TCP
Network-assisted congestion control:
§ routers provide feedback to end systems
- single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM)
- explicit rate sender should send at
Two broad approaches towards congestion control:
9Mao F04
Single Flow, Fixed Bandwidth
§ Adjust rate to match bottleneck bandwidth- without any a priori knowledge- could be gigabit link, could be a modem
A B100 Mbps
10Mao F04
Single Flow, Varying Bandwidth
§ Adjust rate to match instantaneous bandwidth- assuming you have rough idea of bandwidth
A BBW(t)
11Mao F04
Multiple Flows
Two Issues:§ Adjust total sending rate to match bandwidth§ Allocation of bandwidth between flows
A2 B2100 Mbps
A1
A3 B3
B1
12Mao F04
Reality
Congestion control is a resource allocation problem involving many flows, many links, and complicated global dynamics
13Mao F04
General Approaches
§ Send without care- many packet drops- not as stupid as it seems
§ Reservations- pre-arrange bandwidth allocations- requires negotiation before sending packets- low utilization
§ Pricing- don’t drop packets for the high-bidders- requires payment model
14Mao F04
General Approaches (cont’d)
§ Dynamic Adjustment- probe network to test level of congestion- speed up when no congestion- slow down when congestion- suboptimal, messy dynamics, simple to implement
§ All three techniques have their place- but for generic Internet usage, dynamic adjustment is
the most appropriate- due to pricing structure, traffic characteristics, and good
citizenship
15Mao F04
TCP Congestion Control
§ TCP connection has window- controls number of unacknowledged packets
§ Sending rate: ~Window/RTT
§ Vary window size to control sending rate
16Mao F04
Congestion Window (cwnd)
§ Limits how much data can be in transit§ Implemented as # of bytes§ Described as # packets in this lecture
EffectiveWindow = MaxWindow – (LastByteSent – LastByteAcked)
MaxWindow = min(cwnd, AdvertisedWindow)
LastByteAckedLastByteSent
sequence number increases
MaxWindow
EffectiveWindow
17Mao F04
Two Basic Components
§ Detecting congestion
§ Rate adjustment algorithm- depends on congestion or not- three subproblems within adjustment problem
• finding fixed bandwidth• adjusting to bandwidth variations• sharing bandwidth
18Mao F04
Detecting Congestion
§ Packet dropping is best sign of congestion- delay-based methods are hard and risky
§ How do you detect packet drops? ACKs- TCP uses ACKs to signal receipt of data- ACK denotes last contiguous byte received
• actually, ACKs indicate next segment expected
§ Two signs of packet drops- No ACK after certain time interval: time-out- Several duplicate ACKs (ignore for now)
19Mao F04
Rate Adjustment
§ Basic structure:- Upon receipt of ACK (of new data): increase rate- Upon detection of loss: decrease rate
§ But what increase/decrease functions should we use?
- Depends on what problem we are solving
20Mao F04
Problem #1: Single Flow, Fixed BW
§ Want to get a first-order estimate of the available bandwidth
- Assume bandwidth is fixed- Ignore presence of other flows
§ Want to start slow, but rapidly increase rate until packet drop occurs (“slow-start”)
§ Adjustment: - cwnd initially set to 1- cwnd++ upon receipt of ACK
21Mao F04
Slow-Start
§ cwnd increases exponentially: cwnd doubles every time a full cwnd of packets has been sent
- Each ACK releases two packets- Slow-start is called “slow” because of starting point
segment 1cwnd = 1
cwnd = 2 segment 2segment 3
cwnd = 4segment 4segment 5segment 6segment 7
cwnd = 8
cwnd = 3
22Mao F04
Problems with Slow-Start
§ Slow-start can result in many losses- roughly the size of cwnd ~ BW*RTT
§ Example:- at some point, cwnd is enough to fill “pipe”- after another RTT, cwnd is double its previous value- all the excess packets are dropped!
§ Therefore, need a more gentle adjustment algorithm once have rough estimate of bandwidth
23Mao F04
Problem #2: Single Flow, Varying BW
§ Want to be able to track available bandwidth, oscillating around its current value
§ Possible variations: (in terms of RTTs)- multiplicative increase or decrease: cwnd→ a*cwnd- additive increase or decrease: cwnd→ cwnd + b
§ Four alternatives:- AIAD: gentle increase, gentle decrease- AIMD: gentle increase, drastic decrease- MIAD: drastic increase, gentle decrease (too many losses)- MIMD: drastic increase and decrease
24Mao F04
Problem #3: Multiple Flows
§ Want steady state to be “fair”
§ Many notions of fairness, but here all we require is that two identical flows end up with the same bandwidth
§ This eliminates MIMD and AIAD
§ AIMD is the only remaining solution!
25Mao F04
Buffer and Window Dynamics
§ No congestion à x increases by one packet/RTT every RTT§ Congestion à decrease x by factor 2
A BC = 50 pkts/RTT
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Backlog in router (pkts)Congested if > 20
Rate (pkts/RTT)
x
26Mao F04
AIMD Sharing DynamicsA B
x
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
No congestion à rate increases by one packet/RTT every RTTCongestion à decrease rate by factor 2
Rates equalize à fair share
y
27Mao F04
AIAD Sharing DynamicsA B
x
D ENo congestion à x increases by one packet/RTT every RTTCongestion à decrease x by 1
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
y
30Mao F04
Implementing AIMD
§ After each ACK- increment cwnd by 1/cwnd (cwnd += 1/cwnd)- as a result, cwnd is increased by one only if all
segments in a cwnd have been acknowledged
§ But need to decide when to leave slow-start and enter AIMD§ use ssthresh variable
31Mao F04
Slow Start/AIMD Pseudocode
Initially:cwnd = 1;ssthresh = infinite;
New ack received:if (cwnd < ssthresh)
/* Slow Start*/cwnd = cwnd + 1;
else/* Congestion Avoidance */cwnd = cwnd + 1/cwnd;
Timeout:/* Multiplicative decrease */ssthresh = cwnd/2;cwnd = 1;
32Mao F04
The big picture (with timeouts)
Time
cwnd
Timeout
SlowStart
AIMD
ssthresh
Timeout
SlowStart
SlowStart
AIMD
33Mao F04
Congestion Detection Revisited
§ Wait for Retransmission Time Out (RTO)- RTO kills throughput
§ In BSD TCP implementations, RTO is usually more than 500ms
- the granularity of RTT estimate is 500 ms- retransmission timeout is RTT + 4 * mean_deviation
§ Solution: Don’t wait for RTO to expire
34Mao F04
Fast Retransmits
§ Resend a segment after 3 duplicate ACKs
- a duplicate ACK means that an out-of sequence segment was received
§ Notes:- ACKs are for next
expected packet - packet reordering can
cause duplicate ACKs- window may be too small
to get enough duplicate ACKs
ACK 2
segment 1cwnd = 1
cwnd = 2 segment 2segment 3
ACK 4cwnd = 4 segment 4
segment 5segment 6segment 7
ACK 3
3 duplicateACKs
ACK 4
ACK 4
ACK 4
35Mao F04
Fast Retransmit
§ Time-out period often relatively long:
- long delay before resending lost packet
§ Detect lost segments via duplicate ACKs.
- Sender often sends many segments back-to-back
- If segment is lost, there will likely be many duplicate ACKs.
§ If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost:
- fast retransmit: resend segment before timer expires
36Mao F04
event: ACK received, with ACK field value of y if (y > SendBase) {
SendBase = yif (there are currently not-yet-acknowledged segments)
start timer }
else { increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) {
resend segment with sequence number y}
Fast retransmit algorithm:
a duplicate ACK for already ACKed segment
fast retransmit
37Mao F04
Fast Recovery: After a Fast Retransmit
§ ssthresh = cwnd / 2§ cwnd = ssthresh
- instead of setting cwnd to 1, cut cwnd in half (multiplicative decrease)
§ for each dup ack arrival- dupack++- MaxWindow = min(cwnd + dupack, AdvWin) - indicates packet left network, so we may be able to send
more§ receive ack for new data (beyond initial dup ack)
- dupack = 0- exit fast recovery
§ But when RTO expires still do cwnd = 1
38Mao F04
Fast Retransmit and Fast Recovery
§ Retransmit after 3 duplicated acks- Prevent expensive timeouts
§ Reduce slow starts§ At steady state, cwnd oscillates around the
optimal window size
Time
cwnd
Slow Start
AI/MD
Fast retransmit
39Mao F04
TCP Congestion Control Summary
§ Measure available bandwidth- slow start: fast, hard on network- AIMD: slow, gentle on network
§ Detecting congestion- timeout based on RTT
• robust, causes low throughput- Fast Retransmit: avoids timeouts when few packets lost
• can be fooled, maintains high throughput
§ Recovering from loss- Fast recovery: don’t set cwnd=1 with fast retransmits
40Mao F04
Summary: TCP Congestion Control
§ When CongWin is below Threshold, sender in slow-start phase, window grows exponentially.
§ When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly.
§ When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold.
§ When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS.
41Mao F04
Issues to Think About
§ What about short flows? (setting initial cwnd)- most flows are short- most bytes are in long flows
§ How does this work over wireless links?- packet reordering fools fast retransmit- loss not always congestion related
§ High speeds?- to reach 10gbps, packet losses occur every 90 minutes!
§ Why are losses bad?- Tornado codes: can reconstruct data proportional to packets
that get through. Why not send at maximal rate?
§ Fairness: how do flows with different RTTs share link?