cs 4284 operating systems
TRANSCRIPT
CS 4284
Systems Capstone
Godmar Back
Networking
TCP
CS 4284 Spring 2013
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer
bull flow control
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
CS 4284 Spring 2013
TCP Overview RFCs 793 1122 1323
2018 2581
bull full duplex data
ndash bi-directional data flow in
same connection
ndash MSS maximum segment
size
bull connection-oriented
ndash handshaking (exchange
of control msgs) initrsquos
sender receiver state
before data exchange
bull flow controlled
ndash sender will not overwhelm
receiver
bull point-to-point
ndash one sender one receiver
bull reliable in-order byte stream
ndash no ldquomessage boundariesrdquo
bull pipelined
ndash TCP congestion and flow
control set window size
bull send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
CS 4284 Spring 2013
TCP Segment Structure
source port dest port
32 bits
application data
(variable length)
sequence number
acknowledgement number
receive window
urg data pnter checksum
F S R P A U head len
not used
options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now (generally not used)
RST SYN FIN connection
establishment (setup teardown
commands)
bytes rcvr willing to accept
counting by bytes of data (not segments)
Internet checksum
(as in UDP)
CS 4284 Spring 2013
TCP Reliable Data Transfer
bull TCP creates rdt
service on top of IPrsquos
unreliable service
bull Pipelined segments
bull Cumulative acks
bull TCP uses single
retransmission timer
bull Retransmissions are
triggered by
ndash timeout events
ndash duplicate acks
bull Initially consider
simplified TCP
sender
ndash ignore duplicate acks
ndash ignore flow control
congestion control
CS 4284 Spring 2013
TCP Seq rsquos and ACKs Seq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKs
ndash seq of next byte expected from other side
ndash cumulative ACK
Q how receiver handles out-of-order segments
ndash A TCP spec doesnrsquot say - up to implementor
Host A Host B
User types
lsquoCrsquo
host ACKs receipt
of echoed lsquoCrsquo
host ACKs receipt of lsquoCrsquo echoes
back lsquoCrsquo
time
simple telnet scenario
CS 4284 Spring 2013
TCP Sender Events data rcvd from app
bull create segment with seq
bull seq is byte-stream number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeout
bull retransmit segment that
caused timeout
bull restart timer
ack rcvd
bull If acknowledges
previously unacked
segments
ndash update what is known to
be acked
ndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP
sender (simplified)
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever)
switch(event)
event data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
end of loop forever
Comment
bull SendBase-1 last
cumulatively
ackrsquoed byte
Example
bull SendBase-1 = 71 so
SendBase = 72 say Ack
received with y= 73 so the
rcvr acknowledges up to
including 72 now wants
73+
y is gt SendBase so
know that new data is
acked set SendBase to
73
CS 4284 Spring 2013
TCP retransmission scenarios Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
time
SendBase = 100
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t S
eq=9
2 t
imeou
t
SendBase = 120
SendBase = 120
Sendbase = 100
CS 4284 Spring 2013
TCP retransmission scenarios
(more) Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase = 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithm
bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Nagle ndash Transmit first byte
ndash Buffer outgoing bytes until ack has been received ndash then send all at once
bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
TCP
CS 4284 Spring 2013
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer
bull flow control
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
CS 4284 Spring 2013
TCP Overview RFCs 793 1122 1323
2018 2581
bull full duplex data
ndash bi-directional data flow in
same connection
ndash MSS maximum segment
size
bull connection-oriented
ndash handshaking (exchange
of control msgs) initrsquos
sender receiver state
before data exchange
bull flow controlled
ndash sender will not overwhelm
receiver
bull point-to-point
ndash one sender one receiver
bull reliable in-order byte stream
ndash no ldquomessage boundariesrdquo
bull pipelined
ndash TCP congestion and flow
control set window size
bull send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
CS 4284 Spring 2013
TCP Segment Structure
source port dest port
32 bits
application data
(variable length)
sequence number
acknowledgement number
receive window
urg data pnter checksum
F S R P A U head len
not used
options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now (generally not used)
RST SYN FIN connection
establishment (setup teardown
commands)
bytes rcvr willing to accept
counting by bytes of data (not segments)
Internet checksum
(as in UDP)
CS 4284 Spring 2013
TCP Reliable Data Transfer
bull TCP creates rdt
service on top of IPrsquos
unreliable service
bull Pipelined segments
bull Cumulative acks
bull TCP uses single
retransmission timer
bull Retransmissions are
triggered by
ndash timeout events
ndash duplicate acks
bull Initially consider
simplified TCP
sender
ndash ignore duplicate acks
ndash ignore flow control
congestion control
CS 4284 Spring 2013
TCP Seq rsquos and ACKs Seq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKs
ndash seq of next byte expected from other side
ndash cumulative ACK
Q how receiver handles out-of-order segments
ndash A TCP spec doesnrsquot say - up to implementor
Host A Host B
User types
lsquoCrsquo
host ACKs receipt
of echoed lsquoCrsquo
host ACKs receipt of lsquoCrsquo echoes
back lsquoCrsquo
time
simple telnet scenario
CS 4284 Spring 2013
TCP Sender Events data rcvd from app
bull create segment with seq
bull seq is byte-stream number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeout
bull retransmit segment that
caused timeout
bull restart timer
ack rcvd
bull If acknowledges
previously unacked
segments
ndash update what is known to
be acked
ndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP
sender (simplified)
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever)
switch(event)
event data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
end of loop forever
Comment
bull SendBase-1 last
cumulatively
ackrsquoed byte
Example
bull SendBase-1 = 71 so
SendBase = 72 say Ack
received with y= 73 so the
rcvr acknowledges up to
including 72 now wants
73+
y is gt SendBase so
know that new data is
acked set SendBase to
73
CS 4284 Spring 2013
TCP retransmission scenarios Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
time
SendBase = 100
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t S
eq=9
2 t
imeou
t
SendBase = 120
SendBase = 120
Sendbase = 100
CS 4284 Spring 2013
TCP retransmission scenarios
(more) Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase = 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithm
bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Nagle ndash Transmit first byte
ndash Buffer outgoing bytes until ack has been received ndash then send all at once
bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer
bull flow control
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
CS 4284 Spring 2013
TCP Overview RFCs 793 1122 1323
2018 2581
bull full duplex data
ndash bi-directional data flow in
same connection
ndash MSS maximum segment
size
bull connection-oriented
ndash handshaking (exchange
of control msgs) initrsquos
sender receiver state
before data exchange
bull flow controlled
ndash sender will not overwhelm
receiver
bull point-to-point
ndash one sender one receiver
bull reliable in-order byte stream
ndash no ldquomessage boundariesrdquo
bull pipelined
ndash TCP congestion and flow
control set window size
bull send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
CS 4284 Spring 2013
TCP Segment Structure
source port dest port
32 bits
application data
(variable length)
sequence number
acknowledgement number
receive window
urg data pnter checksum
F S R P A U head len
not used
options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now (generally not used)
RST SYN FIN connection
establishment (setup teardown
commands)
bytes rcvr willing to accept
counting by bytes of data (not segments)
Internet checksum
(as in UDP)
CS 4284 Spring 2013
TCP Reliable Data Transfer
bull TCP creates rdt
service on top of IPrsquos
unreliable service
bull Pipelined segments
bull Cumulative acks
bull TCP uses single
retransmission timer
bull Retransmissions are
triggered by
ndash timeout events
ndash duplicate acks
bull Initially consider
simplified TCP
sender
ndash ignore duplicate acks
ndash ignore flow control
congestion control
CS 4284 Spring 2013
TCP Seq rsquos and ACKs Seq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKs
ndash seq of next byte expected from other side
ndash cumulative ACK
Q how receiver handles out-of-order segments
ndash A TCP spec doesnrsquot say - up to implementor
Host A Host B
User types
lsquoCrsquo
host ACKs receipt
of echoed lsquoCrsquo
host ACKs receipt of lsquoCrsquo echoes
back lsquoCrsquo
time
simple telnet scenario
CS 4284 Spring 2013
TCP Sender Events data rcvd from app
bull create segment with seq
bull seq is byte-stream number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeout
bull retransmit segment that
caused timeout
bull restart timer
ack rcvd
bull If acknowledges
previously unacked
segments
ndash update what is known to
be acked
ndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP
sender (simplified)
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever)
switch(event)
event data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
end of loop forever
Comment
bull SendBase-1 last
cumulatively
ackrsquoed byte
Example
bull SendBase-1 = 71 so
SendBase = 72 say Ack
received with y= 73 so the
rcvr acknowledges up to
including 72 now wants
73+
y is gt SendBase so
know that new data is
acked set SendBase to
73
CS 4284 Spring 2013
TCP retransmission scenarios Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
time
SendBase = 100
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t S
eq=9
2 t
imeou
t
SendBase = 120
SendBase = 120
Sendbase = 100
CS 4284 Spring 2013
TCP retransmission scenarios
(more) Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase = 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithm
bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Nagle ndash Transmit first byte
ndash Buffer outgoing bytes until ack has been received ndash then send all at once
bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Overview RFCs 793 1122 1323
2018 2581
bull full duplex data
ndash bi-directional data flow in
same connection
ndash MSS maximum segment
size
bull connection-oriented
ndash handshaking (exchange
of control msgs) initrsquos
sender receiver state
before data exchange
bull flow controlled
ndash sender will not overwhelm
receiver
bull point-to-point
ndash one sender one receiver
bull reliable in-order byte stream
ndash no ldquomessage boundariesrdquo
bull pipelined
ndash TCP congestion and flow
control set window size
bull send amp receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes dataapplication
reads data
CS 4284 Spring 2013
TCP Segment Structure
source port dest port
32 bits
application data
(variable length)
sequence number
acknowledgement number
receive window
urg data pnter checksum
F S R P A U head len
not used
options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now (generally not used)
RST SYN FIN connection
establishment (setup teardown
commands)
bytes rcvr willing to accept
counting by bytes of data (not segments)
Internet checksum
(as in UDP)
CS 4284 Spring 2013
TCP Reliable Data Transfer
bull TCP creates rdt
service on top of IPrsquos
unreliable service
bull Pipelined segments
bull Cumulative acks
bull TCP uses single
retransmission timer
bull Retransmissions are
triggered by
ndash timeout events
ndash duplicate acks
bull Initially consider
simplified TCP
sender
ndash ignore duplicate acks
ndash ignore flow control
congestion control
CS 4284 Spring 2013
TCP Seq rsquos and ACKs Seq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKs
ndash seq of next byte expected from other side
ndash cumulative ACK
Q how receiver handles out-of-order segments
ndash A TCP spec doesnrsquot say - up to implementor
Host A Host B
User types
lsquoCrsquo
host ACKs receipt
of echoed lsquoCrsquo
host ACKs receipt of lsquoCrsquo echoes
back lsquoCrsquo
time
simple telnet scenario
CS 4284 Spring 2013
TCP Sender Events data rcvd from app
bull create segment with seq
bull seq is byte-stream number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeout
bull retransmit segment that
caused timeout
bull restart timer
ack rcvd
bull If acknowledges
previously unacked
segments
ndash update what is known to
be acked
ndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP
sender (simplified)
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever)
switch(event)
event data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
end of loop forever
Comment
bull SendBase-1 last
cumulatively
ackrsquoed byte
Example
bull SendBase-1 = 71 so
SendBase = 72 say Ack
received with y= 73 so the
rcvr acknowledges up to
including 72 now wants
73+
y is gt SendBase so
know that new data is
acked set SendBase to
73
CS 4284 Spring 2013
TCP retransmission scenarios Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
time
SendBase = 100
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t S
eq=9
2 t
imeou
t
SendBase = 120
SendBase = 120
Sendbase = 100
CS 4284 Spring 2013
TCP retransmission scenarios
(more) Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase = 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithm
bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Nagle ndash Transmit first byte
ndash Buffer outgoing bytes until ack has been received ndash then send all at once
bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Segment Structure
source port dest port
32 bits
application data
(variable length)
sequence number
acknowledgement number
receive window
urg data pnter checksum
F S R P A U head len
not used
options (variable length)
URG urgent data (generally not used)
ACK ACK valid
PSH push data now (generally not used)
RST SYN FIN connection
establishment (setup teardown
commands)
bytes rcvr willing to accept
counting by bytes of data (not segments)
Internet checksum
(as in UDP)
CS 4284 Spring 2013
TCP Reliable Data Transfer
bull TCP creates rdt
service on top of IPrsquos
unreliable service
bull Pipelined segments
bull Cumulative acks
bull TCP uses single
retransmission timer
bull Retransmissions are
triggered by
ndash timeout events
ndash duplicate acks
bull Initially consider
simplified TCP
sender
ndash ignore duplicate acks
ndash ignore flow control
congestion control
CS 4284 Spring 2013
TCP Seq rsquos and ACKs Seq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKs
ndash seq of next byte expected from other side
ndash cumulative ACK
Q how receiver handles out-of-order segments
ndash A TCP spec doesnrsquot say - up to implementor
Host A Host B
User types
lsquoCrsquo
host ACKs receipt
of echoed lsquoCrsquo
host ACKs receipt of lsquoCrsquo echoes
back lsquoCrsquo
time
simple telnet scenario
CS 4284 Spring 2013
TCP Sender Events data rcvd from app
bull create segment with seq
bull seq is byte-stream number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeout
bull retransmit segment that
caused timeout
bull restart timer
ack rcvd
bull If acknowledges
previously unacked
segments
ndash update what is known to
be acked
ndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP
sender (simplified)
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever)
switch(event)
event data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
end of loop forever
Comment
bull SendBase-1 last
cumulatively
ackrsquoed byte
Example
bull SendBase-1 = 71 so
SendBase = 72 say Ack
received with y= 73 so the
rcvr acknowledges up to
including 72 now wants
73+
y is gt SendBase so
know that new data is
acked set SendBase to
73
CS 4284 Spring 2013
TCP retransmission scenarios Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
time
SendBase = 100
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t S
eq=9
2 t
imeou
t
SendBase = 120
SendBase = 120
Sendbase = 100
CS 4284 Spring 2013
TCP retransmission scenarios
(more) Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase = 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithm
bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Nagle ndash Transmit first byte
ndash Buffer outgoing bytes until ack has been received ndash then send all at once
bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Reliable Data Transfer
bull TCP creates rdt
service on top of IPrsquos
unreliable service
bull Pipelined segments
bull Cumulative acks
bull TCP uses single
retransmission timer
bull Retransmissions are
triggered by
ndash timeout events
ndash duplicate acks
bull Initially consider
simplified TCP
sender
ndash ignore duplicate acks
ndash ignore flow control
congestion control
CS 4284 Spring 2013
TCP Seq rsquos and ACKs Seq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKs
ndash seq of next byte expected from other side
ndash cumulative ACK
Q how receiver handles out-of-order segments
ndash A TCP spec doesnrsquot say - up to implementor
Host A Host B
User types
lsquoCrsquo
host ACKs receipt
of echoed lsquoCrsquo
host ACKs receipt of lsquoCrsquo echoes
back lsquoCrsquo
time
simple telnet scenario
CS 4284 Spring 2013
TCP Sender Events data rcvd from app
bull create segment with seq
bull seq is byte-stream number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeout
bull retransmit segment that
caused timeout
bull restart timer
ack rcvd
bull If acknowledges
previously unacked
segments
ndash update what is known to
be acked
ndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP
sender (simplified)
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever)
switch(event)
event data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
end of loop forever
Comment
bull SendBase-1 last
cumulatively
ackrsquoed byte
Example
bull SendBase-1 = 71 so
SendBase = 72 say Ack
received with y= 73 so the
rcvr acknowledges up to
including 72 now wants
73+
y is gt SendBase so
know that new data is
acked set SendBase to
73
CS 4284 Spring 2013
TCP retransmission scenarios Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
time
SendBase = 100
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t S
eq=9
2 t
imeou
t
SendBase = 120
SendBase = 120
Sendbase = 100
CS 4284 Spring 2013
TCP retransmission scenarios
(more) Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase = 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithm
bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Nagle ndash Transmit first byte
ndash Buffer outgoing bytes until ack has been received ndash then send all at once
bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Seq rsquos and ACKs Seq rsquos
ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data
ACKs
ndash seq of next byte expected from other side
ndash cumulative ACK
Q how receiver handles out-of-order segments
ndash A TCP spec doesnrsquot say - up to implementor
Host A Host B
User types
lsquoCrsquo
host ACKs receipt
of echoed lsquoCrsquo
host ACKs receipt of lsquoCrsquo echoes
back lsquoCrsquo
time
simple telnet scenario
CS 4284 Spring 2013
TCP Sender Events data rcvd from app
bull create segment with seq
bull seq is byte-stream number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeout
bull retransmit segment that
caused timeout
bull restart timer
ack rcvd
bull If acknowledges
previously unacked
segments
ndash update what is known to
be acked
ndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP
sender (simplified)
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever)
switch(event)
event data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
end of loop forever
Comment
bull SendBase-1 last
cumulatively
ackrsquoed byte
Example
bull SendBase-1 = 71 so
SendBase = 72 say Ack
received with y= 73 so the
rcvr acknowledges up to
including 72 now wants
73+
y is gt SendBase so
know that new data is
acked set SendBase to
73
CS 4284 Spring 2013
TCP retransmission scenarios Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
time
SendBase = 100
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t S
eq=9
2 t
imeou
t
SendBase = 120
SendBase = 120
Sendbase = 100
CS 4284 Spring 2013
TCP retransmission scenarios
(more) Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase = 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithm
bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Nagle ndash Transmit first byte
ndash Buffer outgoing bytes until ack has been received ndash then send all at once
bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Sender Events data rcvd from app
bull create segment with seq
bull seq is byte-stream number of first data byte in segment
bull start timer if not already running (think of timer as for oldest unacked segment)
bull expiration interval TimeOutInterval
timeout
bull retransmit segment that
caused timeout
bull restart timer
ack rcvd
bull If acknowledges
previously unacked
segments
ndash update what is known to
be acked
ndash start timer if there are
outstanding segments
CS 4284 Spring 2013
TCP
sender (simplified)
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever)
switch(event)
event data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
end of loop forever
Comment
bull SendBase-1 last
cumulatively
ackrsquoed byte
Example
bull SendBase-1 = 71 so
SendBase = 72 say Ack
received with y= 73 so the
rcvr acknowledges up to
including 72 now wants
73+
y is gt SendBase so
know that new data is
acked set SendBase to
73
CS 4284 Spring 2013
TCP retransmission scenarios Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
time
SendBase = 100
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t S
eq=9
2 t
imeou
t
SendBase = 120
SendBase = 120
Sendbase = 100
CS 4284 Spring 2013
TCP retransmission scenarios
(more) Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase = 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithm
bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Nagle ndash Transmit first byte
ndash Buffer outgoing bytes until ack has been received ndash then send all at once
bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP
sender (simplified)
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever)
switch(event)
event data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
end of loop forever
Comment
bull SendBase-1 last
cumulatively
ackrsquoed byte
Example
bull SendBase-1 = 71 so
SendBase = 72 say Ack
received with y= 73 so the
rcvr acknowledges up to
including 72 now wants
73+
y is gt SendBase so
know that new data is
acked set SendBase to
73
CS 4284 Spring 2013
TCP retransmission scenarios Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
time
SendBase = 100
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t S
eq=9
2 t
imeou
t
SendBase = 120
SendBase = 120
Sendbase = 100
CS 4284 Spring 2013
TCP retransmission scenarios
(more) Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase = 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithm
bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Nagle ndash Transmit first byte
ndash Buffer outgoing bytes until ack has been received ndash then send all at once
bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP retransmission scenarios Host A
loss
tim
eou
t
lost ACK scenario
Host B
X
time
SendBase = 100
Host A
time
premature timeout
Host B
Seq=
92
tim
eou
t S
eq=9
2 t
imeou
t
SendBase = 120
SendBase = 120
Sendbase = 100
CS 4284 Spring 2013
TCP retransmission scenarios
(more) Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase = 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithm
bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Nagle ndash Transmit first byte
ndash Buffer outgoing bytes until ack has been received ndash then send all at once
bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP retransmission scenarios
(more) Host A
loss
tim
eou
t
Cumulative ACK scenario
Host B
X
time
SendBase = 120
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithm
bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Nagle ndash Transmit first byte
ndash Buffer outgoing bytes until ack has been received ndash then send all at once
bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP ACK generation [RFC 1122 RFC 2581]
Event at Receiver
Arrival of in-order segment with
expected seq All data up to
expected seq already ACKed
Arrival of in-order segment with
expected seq One other
segment has ACK pending
Arrival of out-of-order segment
higher-than-expect seq
Gap detected
Arrival of segment that
partially or completely fills gap
TCP Receiver action
Delayed ACK Wait up to 500ms
for next segment If no next segment
send ACK
Immediately send single cumulative
ACK ACKing both in-order segments
Immediately send duplicate ACK
indicating seq of next expected byte
Immediate send ACK provided that
segment starts at lower end of gap
CS 4284 Spring 2013
Naglersquos algorithm
bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Nagle ndash Transmit first byte
ndash Buffer outgoing bytes until ack has been received ndash then send all at once
bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Naglersquos algorithm
bull Consider again apps writing byte-by-byte (or in small chunks) ndash If you send 1-byte segments with 20 byte TCP
header overhead + overhead of lower layer headers ndashyou get a huge waste of network capacity
bull Nagle ndash Transmit first byte
ndash Buffer outgoing bytes until ack has been received ndash then send all at once
bull You can turn this off via ndash Use setsockopt (SOL_SOCKET TCP_NODELAY)
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Delayed ACK vs Nagle
bull These two mechanisms have nothing to do with each other ndash often confused especially by various online sources
ndash TCP_NODELAY means ldquodisable Naglerdquo not ldquodisable delayed ACKsrdquo
bull Delayed ACK implemented by TCP Receiver ndash Delays sending an acknowledgement (because acks
are cumulative can reduce of acks sent)
bull Naglersquos algorithm implemented by TCP Sender ndash Delays sending actual data (so that more data can be
fit into a single segment)
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
TCP
Timeout amp Fast Retransmit
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Q how to set TCP
timeout value
bull longer than RTT
ndash but RTT varies
bull too short premature
timeout
ndash unnecessary
retransmissions
bull too long slow
reaction to segment
loss
Q how to estimate RTT
bull SampleRTT measured time
from segment transmission
until ACK receipt
bull SampleRTT will vary want
estimated RTT ldquosmootherrdquo
ndash average several recent
measurements not just current SampleRTT
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
RTT Distributions
(a) Probability density of ACK arrival times in the data link layer
(b) Probability density of ACK arrival times for TCP
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
EstimatedRTT = (1 - ) EstimatedRTT + SampleRTT
bull exponential weighted moving average
bull influence of past sample decreases
exponentially fast
bull typical value = 0125
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Example RTT estimation RTT gaiacsumassedu to fantasiaeurecomfr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RT
T (
mil
lise
con
ds)
SampleRTT Estimated RTT
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Round Trip Time and Timeout
Setting the timeout
bull EstimatedRTT plus ldquosafety marginrdquo ndash large variation in EstimatedRTT larger safety margin
bull first estimate of how much SampleRTT deviates from EstimatedRTT
TimeoutInterval = EstimatedRTT + 4DevRTT
DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|
(typically = 0125 = 025)
EstimatedRTT = (1-)EstimatedRTT + SampleRTT
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
RTT Measurement amp
Retransmissions
bull How should you handle acks for retransmitted segments when measuring RTT
ndash Note ACK could be delayed for original segment or early for retransmitted segment
bull Choices
ndash Associate with original packet may overestimate true RTT
ndash Associate with retransmitted packet may underestimate true RTT
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Karnrsquos Algorithm
bull Idea donrsquot consider samples from retransmitted segments ndash Ok but what if current timeout value is too small for network
delay
ndash In this case would keep timing out but couldnrsquot adjust timeout according to formula
bull Solution If measurement canrsquot be made use exponential backoff until new measurement can be made ndash By factor of 2 with limit
bull Resume normal sampling algorithm afterwards ndash Usual sampling period is once per RTT
bull See also RFC 2988 ndash TCP timestamp option can be used to remove ambiguity which ack matches which packet
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Fast Retransmit
bull Time-out period often
relatively long
ndash long delay before
resending lost packet
bull Detect lost segments via
duplicate ACKs
ndash Sender often sends many
segments back-to-back
ndash If segment is lost there will
likely be many duplicate
ACKs
bull If sender receives 3
ACKs for the same data
it supposes that segment
after ACKed data was
lost
ndash fast retransmit resend
segment before timer
expires
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
event ACK received with ACK field value of y
if (y gt SendBase)
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
else
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3)
resend segment with sequence number y
Fast retransmit algorithm
a duplicate ACK for already ACKed segment
fast retransmit
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
TCP
Flow Control
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Flow Control
bull receive side of TCP connection has a receive buffer
bull speed-matching
service matching
the send rate to the
receiving apprsquos drain
rate bull app process may be
slow at reading from
buffer
sender wonrsquot overflow receiverrsquos buffer by
transmitting too much too fast
flow control
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Flow Control (contrsquod)
(Suppose TCP receiver discards
out-of-order segments)
bull spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd -
LastByteRead]
bull Rcvr advertises spare room by including value of RcvWindow in segments
bull Sender limits unACKed data to RcvWindow
ndash guarantees receive buffer doesnrsquot overflow
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Flow Control Persistence Timer
bull Suppose sender is blocked cause receiver
application hasnrsquot picked up data
bull Then receiver app reads n bytes
bull TCP receiver advertises new window of size
RcvWindow = n
bull But suppose this advertisement is lost
bull Sender would be stuck
ndash Solution persistence timer sender sends probe after
a period of inactivity to provoke window update
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Flow Control Silly Window Syndrome
Clarkrsquos solution donrsquot advertise windows below a certain size
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
TCP
Connection Management
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Connection Management Recall TCP sender receiver
establish ldquoconnectionrdquo before
exchanging data segments
bull initialize TCP variables
ndash seq s
ndash buffers flow control info (eg RcvWindow)
bull client connection initiator
connect(s ampdstaddr hellip)
bull server contacted by client
cl=accept(sv
ampcaddrhellip)
Three way handshake
Step 1 client host sends TCP
SYN segment to server
ndash specifies initial seq
ndash no data
Step 2 server host receives SYN
replies with SYNACK segment
ndash server allocates buffers
ndash specifies server initial seq
Step 3 client receives SYNACK
replies with ACK segment
which may contain data
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP 3-way handshake
TCP connection establishment bull Q1 why 3-way and not 2-way handshake
bull Q2 how do sender amp receiver determine initial seqnums
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
3-way Handshake amp Delayed Dups
(a) Normal operation
(b) Old SYN appearing out of nowhere
(c) Duplicate SYN and duplicate ACK following SYN
3-way handshake required to deal with scenarios (b) and (c)
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Sequence Number Reuse
bull Idea Tie initial TCP seq numbers to clock ndash Increment every 4s guards against previous incarnations of a
connection with identical sequence numbers
bull Must also guard against sequence number prediction attack ndash Use PRNG see [RFC 1948] [CERT 2001-09]
bull RFC 1948 ISN = 4s clock val + F(src dst sport dport random())
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
When Sequence Numbers Attack
bull Suppose attacker A can predict sequence
number a host B is going to use next
bull By using spoofed source IP C A can engage in
successful 3-way handshake with B
ndash B believes it is talking to C might grant permissions
based on Crsquos IP address
ndash Attacker on A must suppress the RST packets C is
likely to send ndash use a denial-of-service attack for that
bull A sends message to compromise B
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
When SYNs Attack
bull Servers receiving SYN must allocate resources
ndash Opens up possibility of denial-of-service attack where
server is flooded with bogus SYN packets with forged
IP source addresses
bull Solution
ndash SYN cookies
bull Server creates ACK number sends ACK ndash but does not
allocate buffers
ndash If client continues with SYNACK check if ACK could
have been sent then allocate buffers if correct
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
Sequence Number Summary
bull Goals Set 1 ndash Guard against old duplicates in one connection -gt donrsquot
reuse sequence numbers until after itrsquos reasonably certain that old duplicates have disappeared
ndash Even after a crash restart -gt hence tie to time -gt but donrsquot use global clock -gt hence need 3-way handshake were each side verifies the sequence number chosen by the other side before successful connection occurs
bull Goals Set 2 ndash Donrsquot allow high-jacking of connections unless attacker
can eavesdrop ndash use PRNG for initial seq number choice
ndash Donrsquot allow SYN attacks ndash compute but donrsquot store initial sequence number
CS 4284 Spring 2013
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Connection Management (cont)
Closing a connection
client closes socket close(s)
Step 1 client end system
sends TCP FIN control
segment to server
Step 2 server receives FIN
replies with ACK Closes
connection sends FIN
client server
close
close
closed
tim
ed w
ait
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Connection Management (cont)
Step 3 client receives FIN
replies with ACK
ndash Enters ldquotimed waitrdquo - will
respond with ACK to
received FINs
Step 4 server receives ACK
Connection closed
Note with small modification can
handle simultaneous FINs
client server
closing
closing
closed
tim
ed w
ait
closed
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP
Connection
FSM The heavy solid line is the
normal path for a client
The heavy dashed line is the
normal path for a server
The light lines are unusual
events
Each transition is labeled by
the event causing it and the
action resulting from it
separated by a slash
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Connection Management (contrsquod)
TCP client lifecycle TCP server lifecycle
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Closing a Connection
bull Note previous charts showed normal case
bull Can we reliably close a connection if
packets (FIN ACK) can be lost
ndash No Famous two-army problem
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Summary
bull TCP segments acknowledgements amp retransmission ndash Delayed ACKs Naglersquos algorithm
ndash Fast retransmit
bull RTT estimation amp Karnrsquos algorithm
bull Flow Control amp Silly Window Syndrome
bull Connection Management in TCP
bull Attacks against TCPrsquos connection management scheme ndash SYN attack
ndash Sequence number prediction attacks
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Miscellaneous
bull MSS Maximum Segment Size Option ndash Clientserver agree on larger than default (536
outside same subnet) MSS option on SYN
bull SACK ndash selective acknowledgements
bull WSCALE ndash scale factor for receive window to allow for LFN (ldquoelefantrdquo) ndash Large Fat Networks
bull RFC 1323 timestamps for accurate RTT measurement PAWS for protection against wrap-around for sequence numbers
bull hellip
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Study of TCP Outline
bull segment structure
bull reliable data transfer ndash delayed ACKs
ndash Naglersquos algorithm
bull timeout management fast retransmit
bull flow control + silly window syndrome
bull connection management
[ Network Address Translation ]
[ Principles of congestion control ]
bull TCP congestion control
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
TCP
Congestion Control
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Principles of Congestion Control
Congestion
bull informally ldquotoo many sources sending too much
data too fast for network to handlerdquo
bull different from flow control
bull manifestations
ndash long delays (queueing in router buffers)
ndash lost packets (buffer overflow at routers)
bull a top-10 problem
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Causescosts of congestion scenario 1
bull two senders two
receivers
bull one router infinite
buffers
bull no retransmission
bull large delays when congested
bull (but no reduction in throughput here)
unlimited shared
output link buffers
Host A lin original data
Host B
lout
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Causescosts of congestion scenario 2
bull one router finite buffers
bull sender retransmission of lost packet after
timeout
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Causescosts of congestion scenario 2 Always lin = lout (goodput)
bull a) if no loss lrsquoin = lin
bull b) assume clairvoyant sender retransmission only when loss certain
bull c) retransmission of both delayed and lost packets makes lrsquoin larger for
same lout (every packet transmitted twice)
ldquocostsrdquo of congestion
bull more work (retrans) for given ldquogoodputrdquo
bull unneeded retransmissions link carries multiple copies of pkt
b
R2
R2 lin
lout
a c
R2
R2 lin
lout
R4
R2
R2 lin
lout
R3
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Causescosts of congestion scenario 3
bull four senders
bull multihop paths
bull timeoutretransmit
Q what happens as lin
and lrsquoin increase
finite shared output
link buffers
Host A lin original data
Host B
lout
lin original data plus
retransmitted data
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Causescosts of congestion scenario 3
Another ldquocostrdquo of congestion
bull when packet is dropped any upstream transmission
capacity used for that packet was wasted
bull ultimately leads to congestion collapse
H
o
s
t
A
H
o
s
t
B
lo
u
t
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Reasons for Congestion Control
bull Congested networks increase delay even
if no packet loss occurs
bull If packet loss occurs needed
retransmission require offered load to be
greater than goodput
bull Downstream losses waste upstream
transmission capacity leading to
congestion collapse in the worst case
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Congestion Control Approaches
End-end congestion
control
bull no explicit feedback from
network
bull congestion inferred from
end-system observed
loss delay
bull approach taken by TCP
Network-assisted congestion control
bull routers provide feedback to end systems
ndash single bit indicating congestion (SNA DECbit TCPIP ECN ATM)
ndash explicit rate sender should send at
Two broad classes
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Congestion Control
bull end-end control (no network
assistance)
bull sender limits transmission
LastByteSent-LastByteAcked
min(CongWin RcvWindow)
bull CongWin and RTT influence throughput
bull CongWin is dynamic function of
perceived network congestion
How does sender notice
congestion
bull loss event = timeout or 3
duplicate acks
bull TCP sender reduces rate (CongWin) after loss event
bull (assumes congestion is
primary cause of loss)
Three mechanisms
bull AIMD
bull slow start
bull fast recovery
rate = CongWin
RTT
Bytessec
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease cut CongWin in half
after loss event
additive increase increase CongWin by 1 MSS
every RTT in the
absence of loss events
probing
Long-lived TCP connection
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Slow Start
bull When connection begins CongWin = 1 MSS ndash Example MSS = 500
bytes amp RTT = 200 msec
ndash initial rate = 20 kbps
bull available bandwidth may be gtgt MSSRTT ndash desirable to quickly
ramp up to respectable rate
bull When connection
begins increase rate
exponentially fast until
first loss event
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Slow Start (more)
bull When connection begins
increase rate
exponentially until first
loss event
ndash double CongWin every
RTT
ndash done by incrementing CongWin for every ACK
received
bull Summary initial rate is
slow but ramps up
exponentially fast
Host A
RT
T
Host B
time
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Tahoe vs Reno
Q When should the exponential increase switch to linear
A When CongWin gets to 12 of its value before timeout
Implementation
bull Variable Threshold
bull At loss event Threshold is set to 12 of CongWin just before loss event
bull If timeout event set CongWin to 1
bull If triple-ack event set CongWin to Threshold and increase linearly (this is called fast recovery and was added in version TCP Reno)
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Timeouts vs 3-dup ACKs
bull After 3 dup ACKs
ndash CongWin is cut in half
ndash window then grows linearly
bull But after timeout event
ndash CongWin instead set to 1 MSS
ndash window then grows exponentially
ndash to a threshold then grows linearly
bull 3 dup ACKs indicates
network capable of
delivering some segments
bull timeout before 3 dup
ACKs received is stronger
ldquomore alarmingrdquo indicator
of congestion
Rationale
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Summary TCP Congestion Control
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0
retransmit missing segment
L
cwnd gt ssthresh
congestion
avoidance
cwnd = cwnd + MSS (MSScwnd) dupACKcount = 0 transmit new segment(s) as allowed
new ACK
dupACKcount++
duplicate ACK
fast
recovery
cwnd = cwnd + MSS transmit new segment(s) as allowed
duplicate ACK
ssthresh= cwnd2 cwnd = ssthresh + 3
retransmit missing segment
dupACKcount == 3
timeout
ssthresh = cwnd2 cwnd = 1 dupACKcount = 0
retransmit missing segment
ssthresh= cwnd2 cwnd = ssthresh + 3 retransmit missing segment
dupACKcount == 3 cwnd = ssthresh dupACKcount = 0
New ACK
slow
start
timeout
ssthresh = cwnd2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment
cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s) as allowed
new ACK dupACKcount++
duplicate ACK
L
cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0
New ACK
New ACK
New ACK
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Summary TCP Congestion Control
bull When CongWin is below Threshold sender in
slow-start phase window grows exponentially
bull When CongWin is above Threshold sender is in
congestion-avoidance phase window grows linearly
bull When a triple duplicate ACK occurs Threshold set
to CongWin2 and CongWin set to Threshold
bull When timeout occurs Threshold set to CongWin2
and CongWin is set to 1 MSS
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Sender Congestion Control Event State TCP Sender Action Commentary
ACK receipt for
previously
unacked data
Slow Start
(SS)
CongWin = CongWin + MSS
If (CongWin gt Threshold)
set state to CA
Resulting in a doubling of
CongWin every RTT
ACK receipt for
previously
unacked data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS
(MSSCongWin)
Additive increase resulting in
increase of CongWin by 1
MSS every RTT
Triple duplicate
ACK
SS or CA Threshold = CongWin2
CongWin = Threshold
Set state to CA
Fast recovery implementing
multiplicative decrease
CongWin will not drop below
1 MSS
Timeout SS or CA Threshold = CongWin2
CongWin = 1 MSS
Set state to ldquoSlow Startrdquo
Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK
count for segment being
acked
CongWin and Threshold not
changed
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Throughput Idealized
bull Whatrsquos the average throughout of TCP as a function of window size and RTT ndash Long-lived connection Ignore slow start
bull When window is W throughput is WRTT
bull Just after loss window drops to W2 throughput to W2RTT
bull Average steady-state throughput 75 WRTT
W
W2
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
TCP Throughput amp Loss
bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput
bull Requires window size W = 83333 in-flight segments
bull Throughput in terms of loss rate
bull L = 210-10 Very low
bull Require almost perfect link
LRTT
MSS221
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
Equation-Based Control
bull Note TCP congestion control forms
control loop
ndash Inputs round-trip time ldquoloss eventsrdquo (which
are samples of timeout events + 3-ack events)
bull Instead equation-based control uses an
equation to compute sending rate based
on this input
bull See RFC 5348 for more info
CS 4284 Spring 2013
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
TCP Fairness
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Fairness goal if k TCP sessions share same
bottleneck link of bandwidth R each should
have average rate of Rk
TCP connection 1
bottleneck router
capacity R
TCP connection 2
TCP Fairness
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Why is TCP fair
Consider two competing sessions
bull additive increase gives slope of 1 as throughput increases
bull multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
congestion avoidance additive increase
loss decrease window by factor of 2
congestion avoidance additive increase
loss decrease window by factor of 2
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do
CS 4284 Spring 2013
Fairness (more)
Fairness and UDP
bull Multimedia apps often
do not use TCP
ndash do not want rate throttled
by congestion control
bull Instead use UDP
ndash pump audiovideo at
constant rate tolerate
packet loss
bull TCP friendliness
Fairness and parallel TCP
connections
bull nothing prevents app from
opening parallel connections
between 2 hosts
bull Example link of rate R
supporting 9 connections
ndash new app asks for 1 TCP gets
rate R10
ndash new app asks for 9 TCPs gets
R2
bull Thatrsquos what ldquodownload
acceleratorsrdquo do