congestion control - university of...

42
1 Mao F04 Congestion Control EECS 489 Computer Networks http://www.eecs.umich.edu/~zmao/eecs489 Z. Morley Mao Tuesday Oct 5, 2004 Acknowledgement: Some slides taken from Kurose&Ross and Katz&Stoica

Upload: doananh

Post on 27-Apr-2018

223 views

Category:

Documents


2 download

TRANSCRIPT

1Mao F04

Congestion Control

EECS 489 Computer Networkshttp://www.eecs.umich.edu/~zmao/eecs489

Z. Morley MaoTuesday Oct 5, 2004

Acknowledgement: Some slides taken from Kurose&Ross and Katz&Stoica

2Mao F04

What We Know

We know:§ How to process packets in a switch§ How to route packets in the network§ How to send packets reliably

We don’t know:§ How fast to send

3Mao F04

What’s at Stake?

§ Send too slow: link is not fully utilized- wastes time

§ Send too fast: link is fully utilized but....- queue builds up in router buffer (delay)- overflow buffers in routers- overflow buffers in receiving host (ignore)

§ Why are buffer overflows a problem?- packet drops (mine and others)- Interesting history....(Van Jacobson rides to the rescue)

4Mao F04

Abstract View

§ We ignore internal structure of router and model it as having a single queue for a particular input-output pair

Sending Host Buffer in Router Receiving Host

A B

5Mao F04

TCP Flow Control

§ receive side of TCP connection has a receive buffer:

§ speed-matching service: matching the send rate to the receiving app’s drain rate

§ app process may be slow at reading from buffer

sender won’t overflowreceiver’s buffer by

transmitting too much,too fast

flow control

6Mao F04

TCP Flow control: how it works

(Suppose TCP receiver discards out-of-order segments)

§ spare room in buffer= RcvWindow= RcvBuffer-[LastByteRcvd -

LastByteRead]

§ Rcvr advertises spare room by including value of RcvWindow in segments

§ Sender limits unACKed data to RcvWindow

- guarantees receive buffer doesn’t overflow

7Mao F04

Three Congestion Control Problems

§ Adjusting to bottleneck bandwidth

§ Adjusting to variations in bandwidth

§ Sharing bandwidth between flows

8Mao F04

Approaches towards congestion control

End-end congestion control:§ no explicit feedback from

network§ congestion inferred from end-

system observed loss, delay§ approach taken by TCP

Network-assisted congestion control:

§ routers provide feedback to end systems

- single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM)

- explicit rate sender should send at

Two broad approaches towards congestion control:

9Mao F04

Single Flow, Fixed Bandwidth

§ Adjust rate to match bottleneck bandwidth- without any a priori knowledge- could be gigabit link, could be a modem

A B100 Mbps

10Mao F04

Single Flow, Varying Bandwidth

§ Adjust rate to match instantaneous bandwidth- assuming you have rough idea of bandwidth

A BBW(t)

11Mao F04

Multiple Flows

Two Issues:§ Adjust total sending rate to match bandwidth§ Allocation of bandwidth between flows

A2 B2100 Mbps

A1

A3 B3

B1

12Mao F04

Reality

Congestion control is a resource allocation problem involving many flows, many links, and complicated global dynamics

13Mao F04

General Approaches

§ Send without care- many packet drops- not as stupid as it seems

§ Reservations- pre-arrange bandwidth allocations- requires negotiation before sending packets- low utilization

§ Pricing- don’t drop packets for the high-bidders- requires payment model

14Mao F04

General Approaches (cont’d)

§ Dynamic Adjustment- probe network to test level of congestion- speed up when no congestion- slow down when congestion- suboptimal, messy dynamics, simple to implement

§ All three techniques have their place- but for generic Internet usage, dynamic adjustment is

the most appropriate- due to pricing structure, traffic characteristics, and good

citizenship

15Mao F04

TCP Congestion Control

§ TCP connection has window- controls number of unacknowledged packets

§ Sending rate: ~Window/RTT

§ Vary window size to control sending rate

16Mao F04

Congestion Window (cwnd)

§ Limits how much data can be in transit§ Implemented as # of bytes§ Described as # packets in this lecture

EffectiveWindow = MaxWindow – (LastByteSent – LastByteAcked)

MaxWindow = min(cwnd, AdvertisedWindow)

LastByteAckedLastByteSent

sequence number increases

MaxWindow

EffectiveWindow

17Mao F04

Two Basic Components

§ Detecting congestion

§ Rate adjustment algorithm- depends on congestion or not- three subproblems within adjustment problem

• finding fixed bandwidth• adjusting to bandwidth variations• sharing bandwidth

18Mao F04

Detecting Congestion

§ Packet dropping is best sign of congestion- delay-based methods are hard and risky

§ How do you detect packet drops? ACKs- TCP uses ACKs to signal receipt of data- ACK denotes last contiguous byte received

• actually, ACKs indicate next segment expected

§ Two signs of packet drops- No ACK after certain time interval: time-out- Several duplicate ACKs (ignore for now)

19Mao F04

Rate Adjustment

§ Basic structure:- Upon receipt of ACK (of new data): increase rate- Upon detection of loss: decrease rate

§ But what increase/decrease functions should we use?

- Depends on what problem we are solving

20Mao F04

Problem #1: Single Flow, Fixed BW

§ Want to get a first-order estimate of the available bandwidth

- Assume bandwidth is fixed- Ignore presence of other flows

§ Want to start slow, but rapidly increase rate until packet drop occurs (“slow-start”)

§ Adjustment: - cwnd initially set to 1- cwnd++ upon receipt of ACK

21Mao F04

Slow-Start

§ cwnd increases exponentially: cwnd doubles every time a full cwnd of packets has been sent

- Each ACK releases two packets- Slow-start is called “slow” because of starting point

segment 1cwnd = 1

cwnd = 2 segment 2segment 3

cwnd = 4segment 4segment 5segment 6segment 7

cwnd = 8

cwnd = 3

22Mao F04

Problems with Slow-Start

§ Slow-start can result in many losses- roughly the size of cwnd ~ BW*RTT

§ Example:- at some point, cwnd is enough to fill “pipe”- after another RTT, cwnd is double its previous value- all the excess packets are dropped!

§ Therefore, need a more gentle adjustment algorithm once have rough estimate of bandwidth

23Mao F04

Problem #2: Single Flow, Varying BW

§ Want to be able to track available bandwidth, oscillating around its current value

§ Possible variations: (in terms of RTTs)- multiplicative increase or decrease: cwnd→ a*cwnd- additive increase or decrease: cwnd→ cwnd + b

§ Four alternatives:- AIAD: gentle increase, gentle decrease- AIMD: gentle increase, drastic decrease- MIAD: drastic increase, gentle decrease (too many losses)- MIMD: drastic increase and decrease

24Mao F04

Problem #3: Multiple Flows

§ Want steady state to be “fair”

§ Many notions of fairness, but here all we require is that two identical flows end up with the same bandwidth

§ This eliminates MIMD and AIAD

§ AIMD is the only remaining solution!

25Mao F04

Buffer and Window Dynamics

§ No congestion à x increases by one packet/RTT every RTT§ Congestion à decrease x by factor 2

A BC = 50 pkts/RTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if > 20

Rate (pkts/RTT)

x

26Mao F04

AIMD Sharing DynamicsA B

x

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion à rate increases by one packet/RTT every RTTCongestion à decrease rate by factor 2

Rates equalize à fair share

y

27Mao F04

AIAD Sharing DynamicsA B

x

D ENo congestion à x increases by one packet/RTT every RTTCongestion à decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

y

28Mao F04

AIMD

C

x

y

A Bx

C

D Ey

Limit rates:x = y

29Mao F04

AIAD

C

x

y

A Bx

C

D Ey

Limit rates:x and y depend

on initial values

30Mao F04

Implementing AIMD

§ After each ACK- increment cwnd by 1/cwnd (cwnd += 1/cwnd)- as a result, cwnd is increased by one only if all

segments in a cwnd have been acknowledged

§ But need to decide when to leave slow-start and enter AIMD§ use ssthresh variable

31Mao F04

Slow Start/AIMD Pseudocode

Initially:cwnd = 1;ssthresh = infinite;

New ack received:if (cwnd < ssthresh)

/* Slow Start*/cwnd = cwnd + 1;

else/* Congestion Avoidance */cwnd = cwnd + 1/cwnd;

Timeout:/* Multiplicative decrease */ssthresh = cwnd/2;cwnd = 1;

32Mao F04

The big picture (with timeouts)

Time

cwnd

Timeout

SlowStart

AIMD

ssthresh

Timeout

SlowStart

SlowStart

AIMD

33Mao F04

Congestion Detection Revisited

§ Wait for Retransmission Time Out (RTO)- RTO kills throughput

§ In BSD TCP implementations, RTO is usually more than 500ms

- the granularity of RTT estimate is 500 ms- retransmission timeout is RTT + 4 * mean_deviation

§ Solution: Don’t wait for RTO to expire

34Mao F04

Fast Retransmits

§ Resend a segment after 3 duplicate ACKs

- a duplicate ACK means that an out-of sequence segment was received

§ Notes:- ACKs are for next

expected packet - packet reordering can

cause duplicate ACKs- window may be too small

to get enough duplicate ACKs

ACK 2

segment 1cwnd = 1

cwnd = 2 segment 2segment 3

ACK 4cwnd = 4 segment 4

segment 5segment 6segment 7

ACK 3

3 duplicateACKs

ACK 4

ACK 4

ACK 4

35Mao F04

Fast Retransmit

§ Time-out period often relatively long:

- long delay before resending lost packet

§ Detect lost segments via duplicate ACKs.

- Sender often sends many segments back-to-back

- If segment is lost, there will likely be many duplicate ACKs.

§ If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost:

- fast retransmit: resend segment before timer expires

36Mao F04

event: ACK received, with ACK field value of y if (y > SendBase) {

SendBase = yif (there are currently not-yet-acknowledged segments)

start timer }

else { increment count of dup ACKs received for yif (count of dup ACKs received for y = 3) {

resend segment with sequence number y}

Fast retransmit algorithm:

a duplicate ACK for already ACKed segment

fast retransmit

37Mao F04

Fast Recovery: After a Fast Retransmit

§ ssthresh = cwnd / 2§ cwnd = ssthresh

- instead of setting cwnd to 1, cut cwnd in half (multiplicative decrease)

§ for each dup ack arrival- dupack++- MaxWindow = min(cwnd + dupack, AdvWin) - indicates packet left network, so we may be able to send

more§ receive ack for new data (beyond initial dup ack)

- dupack = 0- exit fast recovery

§ But when RTO expires still do cwnd = 1

38Mao F04

Fast Retransmit and Fast Recovery

§ Retransmit after 3 duplicated acks- Prevent expensive timeouts

§ Reduce slow starts§ At steady state, cwnd oscillates around the

optimal window size

Time

cwnd

Slow Start

AI/MD

Fast retransmit

39Mao F04

TCP Congestion Control Summary

§ Measure available bandwidth- slow start: fast, hard on network- AIMD: slow, gentle on network

§ Detecting congestion- timeout based on RTT

• robust, causes low throughput- Fast Retransmit: avoids timeouts when few packets lost

• can be fooled, maintains high throughput

§ Recovering from loss- Fast recovery: don’t set cwnd=1 with fast retransmits

40Mao F04

Summary: TCP Congestion Control

§ When CongWin is below Threshold, sender in slow-start phase, window grows exponentially.

§ When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly.

§ When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold.

§ When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS.

41Mao F04

Issues to Think About

§ What about short flows? (setting initial cwnd)- most flows are short- most bytes are in long flows

§ How does this work over wireless links?- packet reordering fools fast retransmit- loss not always congestion related

§ High speeds?- to reach 10gbps, packet losses occur every 90 minutes!

§ Why are losses bad?- Tornado codes: can reconstruct data proportional to packets

that get through. Why not send at maximal rate?

§ Fairness: how do flows with different RTTs share link?

42Mao F04

Security issues with TCP

§ Example attacks:- Sequence number spoofing- Routing attacks- Source address spoofing- Authentication attacks