transport layer – tcp 1 bbbb tcp flow control, congestion control, connection management, etc....

65
Transport Layer – TCP 1 B TCP Flow Control, Congestion TCP Flow Control, Congestion Control, Connection Management, Control, Connection Management, etc. etc. Part 2 Part 2

Upload: porter-massey

Post on 16-Dec-2015

248 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

1

BB

TCP Flow Control Congestion Control TCP Flow Control Congestion Control Connection Management etcConnection Management etc

Part 2Part 2

Transport Layer ndash TCP

2

BB

Encapsulation in TCPIP

IP datagram

Transport Layer ndash TCP

3

BB

TCP Overview

full duplex data bi-directional app data

flow in same connection MSSMSS maximum segment

size

connection-oriented handshaking (exchange

of control msgs) inits sender receiver state before data exchange

flow controlled sender will not flood

receiver with data

point-to-point one sender one receiver

reliable in-order byte stream no message boundaries

pipelined TCP congestion and flow

control set window size

send amp receive buffers

Error detection retransmission cumulative ACKs timers header fields for sequence and ACK numbers

socketdoor

TCPsend buffer

TCPreceive buffer

socketdoor

segment

applicationwrites data

applicationreads data

Transport Layer ndash TCP

4

BB

Recall

Reliable Data Transfer Mechanisms

Checksum

Timer

Sequence number

ACK NAK

Window pipelining

socketdoor

TCPsend buffer

TCPreceive buffer

socketdoor

Packet -gt

applicationwrites data

applicationreads data

- Verification of integrity of packet

- Signals necessary re-transmission is required

- Keeps track of which packet has been sent and received

- Indicates receipt of packet in good or bad form

- Allows for the sending of multiple yet-to-be-acknowledged packets

Transport Layer ndash TCP

5

BB

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1To check

data

1

Transport Layer ndash TCP

6

BB

Connection Oriented Transport Connection Oriented Transport TCPTCP

TCP Segment Structure SEQ and ACK numbers Calculating the Timeout Interval The Simplified TCP Sender ACK Generation Recommendation (RFC 1122 RFC 2581)

Interesting Transmission Scenarios Flow Control TCP Connection Management

Transport Layer ndash TCP

7

BB

TCP segment structureTCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

URGent data ptrchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection established

(setup tear downcommands)

bytes thercvr is willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

In practice PSH URG and the Urgent Data Pointer are not used

HeaderHeader

We can view these teeny-weenydetails using Ethereal

Transport Layer ndash TCP

8

BB

ExampleExample

Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection

AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00

TCP constructs 500 segments out of the data stream

500000 bytes1000 bytes = 500 segments

Transport Layer ndash TCP

9

BB

TCP sequence s and ACKs

Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number

RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc

ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)

00 1 2 3 4 999 10001000 1001 10021999

Segment 1 Segment 2

Transport Layer ndash TCP

10

BB

TCP sequence s and ACKs

Q how receiver handles out-of-order segments A TCP specs does does

notnot say - decide when implementing

Host A Host BSeq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

C

host ACKsreceipt

of echoedC

host ACKsreceipt of

C echoesback C

time

simple telnet scenario (with echo on)

Assuming that the starting sequence numbers for Host A

and Host B are 4242 and 7979 respectively

Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data

Irsquom sending data starting at seq num=42

serverclient

Transport Layer ndash TCP

11

BB

Yet another server echo Yet another server echo exampleexample

Host A Host B

Seq=42 ACK=79 data = lsquoHellorsquo

Seq=79 ACK=47 data = lsquoHellorsquo

Seq=47 ACK=84 data = lsquo200rsquo

UsertypesHello

host ACKsreceipt

of echoedHello

send something else

host ACKsreceipt ofHello

echoes back Hello

time

Host Aseq=42ack=79

seq=47ack=84

Host B

seq=79

ack=47

seq=84

ack=50

Seq=84 ACK=50 data = lsquo200rsquo

ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the

next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive

Transport Layer ndash TCP

12

BB

TCP Round Trip Time and Timeout

Q how to set TCP how to set TCP timeout valuetimeout value

longer than RTT note RTT will vary

too short premature timeout unnecessary

retransmissions too long slow

reaction to segment loss

RTT = round trip time

Q how to estimate RTThow to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

cumulatively ACKed segments

SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent

measurementsmeasurements not just current SampleRTT

Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet

Transport Layer ndash TCP

13

BB

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially

fast typical value of x 0125 (RFC 2988)

Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025

Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)

DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 2: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

2

BB

Encapsulation in TCPIP

IP datagram

Transport Layer ndash TCP

3

BB

TCP Overview

full duplex data bi-directional app data

flow in same connection MSSMSS maximum segment

size

connection-oriented handshaking (exchange

of control msgs) inits sender receiver state before data exchange

flow controlled sender will not flood

receiver with data

point-to-point one sender one receiver

reliable in-order byte stream no message boundaries

pipelined TCP congestion and flow

control set window size

send amp receive buffers

Error detection retransmission cumulative ACKs timers header fields for sequence and ACK numbers

socketdoor

TCPsend buffer

TCPreceive buffer

socketdoor

segment

applicationwrites data

applicationreads data

Transport Layer ndash TCP

4

BB

Recall

Reliable Data Transfer Mechanisms

Checksum

Timer

Sequence number

ACK NAK

Window pipelining

socketdoor

TCPsend buffer

TCPreceive buffer

socketdoor

Packet -gt

applicationwrites data

applicationreads data

- Verification of integrity of packet

- Signals necessary re-transmission is required

- Keeps track of which packet has been sent and received

- Indicates receipt of packet in good or bad form

- Allows for the sending of multiple yet-to-be-acknowledged packets

Transport Layer ndash TCP

5

BB

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1To check

data

1

Transport Layer ndash TCP

6

BB

Connection Oriented Transport Connection Oriented Transport TCPTCP

TCP Segment Structure SEQ and ACK numbers Calculating the Timeout Interval The Simplified TCP Sender ACK Generation Recommendation (RFC 1122 RFC 2581)

Interesting Transmission Scenarios Flow Control TCP Connection Management

Transport Layer ndash TCP

7

BB

TCP segment structureTCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

URGent data ptrchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection established

(setup tear downcommands)

bytes thercvr is willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

In practice PSH URG and the Urgent Data Pointer are not used

HeaderHeader

We can view these teeny-weenydetails using Ethereal

Transport Layer ndash TCP

8

BB

ExampleExample

Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection

AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00

TCP constructs 500 segments out of the data stream

500000 bytes1000 bytes = 500 segments

Transport Layer ndash TCP

9

BB

TCP sequence s and ACKs

Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number

RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc

ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)

00 1 2 3 4 999 10001000 1001 10021999

Segment 1 Segment 2

Transport Layer ndash TCP

10

BB

TCP sequence s and ACKs

Q how receiver handles out-of-order segments A TCP specs does does

notnot say - decide when implementing

Host A Host BSeq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

C

host ACKsreceipt

of echoedC

host ACKsreceipt of

C echoesback C

time

simple telnet scenario (with echo on)

Assuming that the starting sequence numbers for Host A

and Host B are 4242 and 7979 respectively

Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data

Irsquom sending data starting at seq num=42

serverclient

Transport Layer ndash TCP

11

BB

Yet another server echo Yet another server echo exampleexample

Host A Host B

Seq=42 ACK=79 data = lsquoHellorsquo

Seq=79 ACK=47 data = lsquoHellorsquo

Seq=47 ACK=84 data = lsquo200rsquo

UsertypesHello

host ACKsreceipt

of echoedHello

send something else

host ACKsreceipt ofHello

echoes back Hello

time

Host Aseq=42ack=79

seq=47ack=84

Host B

seq=79

ack=47

seq=84

ack=50

Seq=84 ACK=50 data = lsquo200rsquo

ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the

next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive

Transport Layer ndash TCP

12

BB

TCP Round Trip Time and Timeout

Q how to set TCP how to set TCP timeout valuetimeout value

longer than RTT note RTT will vary

too short premature timeout unnecessary

retransmissions too long slow

reaction to segment loss

RTT = round trip time

Q how to estimate RTThow to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

cumulatively ACKed segments

SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent

measurementsmeasurements not just current SampleRTT

Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet

Transport Layer ndash TCP

13

BB

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially

fast typical value of x 0125 (RFC 2988)

Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025

Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)

DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 3: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

3

BB

TCP Overview

full duplex data bi-directional app data

flow in same connection MSSMSS maximum segment

size

connection-oriented handshaking (exchange

of control msgs) inits sender receiver state before data exchange

flow controlled sender will not flood

receiver with data

point-to-point one sender one receiver

reliable in-order byte stream no message boundaries

pipelined TCP congestion and flow

control set window size

send amp receive buffers

Error detection retransmission cumulative ACKs timers header fields for sequence and ACK numbers

socketdoor

TCPsend buffer

TCPreceive buffer

socketdoor

segment

applicationwrites data

applicationreads data

Transport Layer ndash TCP

4

BB

Recall

Reliable Data Transfer Mechanisms

Checksum

Timer

Sequence number

ACK NAK

Window pipelining

socketdoor

TCPsend buffer

TCPreceive buffer

socketdoor

Packet -gt

applicationwrites data

applicationreads data

- Verification of integrity of packet

- Signals necessary re-transmission is required

- Keeps track of which packet has been sent and received

- Indicates receipt of packet in good or bad form

- Allows for the sending of multiple yet-to-be-acknowledged packets

Transport Layer ndash TCP

5

BB

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1To check

data

1

Transport Layer ndash TCP

6

BB

Connection Oriented Transport Connection Oriented Transport TCPTCP

TCP Segment Structure SEQ and ACK numbers Calculating the Timeout Interval The Simplified TCP Sender ACK Generation Recommendation (RFC 1122 RFC 2581)

Interesting Transmission Scenarios Flow Control TCP Connection Management

Transport Layer ndash TCP

7

BB

TCP segment structureTCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

URGent data ptrchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection established

(setup tear downcommands)

bytes thercvr is willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

In practice PSH URG and the Urgent Data Pointer are not used

HeaderHeader

We can view these teeny-weenydetails using Ethereal

Transport Layer ndash TCP

8

BB

ExampleExample

Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection

AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00

TCP constructs 500 segments out of the data stream

500000 bytes1000 bytes = 500 segments

Transport Layer ndash TCP

9

BB

TCP sequence s and ACKs

Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number

RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc

ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)

00 1 2 3 4 999 10001000 1001 10021999

Segment 1 Segment 2

Transport Layer ndash TCP

10

BB

TCP sequence s and ACKs

Q how receiver handles out-of-order segments A TCP specs does does

notnot say - decide when implementing

Host A Host BSeq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

C

host ACKsreceipt

of echoedC

host ACKsreceipt of

C echoesback C

time

simple telnet scenario (with echo on)

Assuming that the starting sequence numbers for Host A

and Host B are 4242 and 7979 respectively

Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data

Irsquom sending data starting at seq num=42

serverclient

Transport Layer ndash TCP

11

BB

Yet another server echo Yet another server echo exampleexample

Host A Host B

Seq=42 ACK=79 data = lsquoHellorsquo

Seq=79 ACK=47 data = lsquoHellorsquo

Seq=47 ACK=84 data = lsquo200rsquo

UsertypesHello

host ACKsreceipt

of echoedHello

send something else

host ACKsreceipt ofHello

echoes back Hello

time

Host Aseq=42ack=79

seq=47ack=84

Host B

seq=79

ack=47

seq=84

ack=50

Seq=84 ACK=50 data = lsquo200rsquo

ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the

next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive

Transport Layer ndash TCP

12

BB

TCP Round Trip Time and Timeout

Q how to set TCP how to set TCP timeout valuetimeout value

longer than RTT note RTT will vary

too short premature timeout unnecessary

retransmissions too long slow

reaction to segment loss

RTT = round trip time

Q how to estimate RTThow to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

cumulatively ACKed segments

SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent

measurementsmeasurements not just current SampleRTT

Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet

Transport Layer ndash TCP

13

BB

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially

fast typical value of x 0125 (RFC 2988)

Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025

Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)

DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 4: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

4

BB

Recall

Reliable Data Transfer Mechanisms

Checksum

Timer

Sequence number

ACK NAK

Window pipelining

socketdoor

TCPsend buffer

TCPreceive buffer

socketdoor

Packet -gt

applicationwrites data

applicationreads data

- Verification of integrity of packet

- Signals necessary re-transmission is required

- Keeps track of which packet has been sent and received

- Indicates receipt of packet in good or bad form

- Allows for the sending of multiple yet-to-be-acknowledged packets

Transport Layer ndash TCP

5

BB

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1To check

data

1

Transport Layer ndash TCP

6

BB

Connection Oriented Transport Connection Oriented Transport TCPTCP

TCP Segment Structure SEQ and ACK numbers Calculating the Timeout Interval The Simplified TCP Sender ACK Generation Recommendation (RFC 1122 RFC 2581)

Interesting Transmission Scenarios Flow Control TCP Connection Management

Transport Layer ndash TCP

7

BB

TCP segment structureTCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

URGent data ptrchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection established

(setup tear downcommands)

bytes thercvr is willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

In practice PSH URG and the Urgent Data Pointer are not used

HeaderHeader

We can view these teeny-weenydetails using Ethereal

Transport Layer ndash TCP

8

BB

ExampleExample

Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection

AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00

TCP constructs 500 segments out of the data stream

500000 bytes1000 bytes = 500 segments

Transport Layer ndash TCP

9

BB

TCP sequence s and ACKs

Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number

RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc

ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)

00 1 2 3 4 999 10001000 1001 10021999

Segment 1 Segment 2

Transport Layer ndash TCP

10

BB

TCP sequence s and ACKs

Q how receiver handles out-of-order segments A TCP specs does does

notnot say - decide when implementing

Host A Host BSeq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

C

host ACKsreceipt

of echoedC

host ACKsreceipt of

C echoesback C

time

simple telnet scenario (with echo on)

Assuming that the starting sequence numbers for Host A

and Host B are 4242 and 7979 respectively

Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data

Irsquom sending data starting at seq num=42

serverclient

Transport Layer ndash TCP

11

BB

Yet another server echo Yet another server echo exampleexample

Host A Host B

Seq=42 ACK=79 data = lsquoHellorsquo

Seq=79 ACK=47 data = lsquoHellorsquo

Seq=47 ACK=84 data = lsquo200rsquo

UsertypesHello

host ACKsreceipt

of echoedHello

send something else

host ACKsreceipt ofHello

echoes back Hello

time

Host Aseq=42ack=79

seq=47ack=84

Host B

seq=79

ack=47

seq=84

ack=50

Seq=84 ACK=50 data = lsquo200rsquo

ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the

next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive

Transport Layer ndash TCP

12

BB

TCP Round Trip Time and Timeout

Q how to set TCP how to set TCP timeout valuetimeout value

longer than RTT note RTT will vary

too short premature timeout unnecessary

retransmissions too long slow

reaction to segment loss

RTT = round trip time

Q how to estimate RTThow to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

cumulatively ACKed segments

SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent

measurementsmeasurements not just current SampleRTT

Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet

Transport Layer ndash TCP

13

BB

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially

fast typical value of x 0125 (RFC 2988)

Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025

Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)

DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 5: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

5

BB

Internet Checksum Example Note

When adding numbers a carryout from the most significant bit needs to be added to the result

Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1To check

data

1

Transport Layer ndash TCP

6

BB

Connection Oriented Transport Connection Oriented Transport TCPTCP

TCP Segment Structure SEQ and ACK numbers Calculating the Timeout Interval The Simplified TCP Sender ACK Generation Recommendation (RFC 1122 RFC 2581)

Interesting Transmission Scenarios Flow Control TCP Connection Management

Transport Layer ndash TCP

7

BB

TCP segment structureTCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

URGent data ptrchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection established

(setup tear downcommands)

bytes thercvr is willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

In practice PSH URG and the Urgent Data Pointer are not used

HeaderHeader

We can view these teeny-weenydetails using Ethereal

Transport Layer ndash TCP

8

BB

ExampleExample

Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection

AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00

TCP constructs 500 segments out of the data stream

500000 bytes1000 bytes = 500 segments

Transport Layer ndash TCP

9

BB

TCP sequence s and ACKs

Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number

RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc

ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)

00 1 2 3 4 999 10001000 1001 10021999

Segment 1 Segment 2

Transport Layer ndash TCP

10

BB

TCP sequence s and ACKs

Q how receiver handles out-of-order segments A TCP specs does does

notnot say - decide when implementing

Host A Host BSeq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

C

host ACKsreceipt

of echoedC

host ACKsreceipt of

C echoesback C

time

simple telnet scenario (with echo on)

Assuming that the starting sequence numbers for Host A

and Host B are 4242 and 7979 respectively

Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data

Irsquom sending data starting at seq num=42

serverclient

Transport Layer ndash TCP

11

BB

Yet another server echo Yet another server echo exampleexample

Host A Host B

Seq=42 ACK=79 data = lsquoHellorsquo

Seq=79 ACK=47 data = lsquoHellorsquo

Seq=47 ACK=84 data = lsquo200rsquo

UsertypesHello

host ACKsreceipt

of echoedHello

send something else

host ACKsreceipt ofHello

echoes back Hello

time

Host Aseq=42ack=79

seq=47ack=84

Host B

seq=79

ack=47

seq=84

ack=50

Seq=84 ACK=50 data = lsquo200rsquo

ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the

next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive

Transport Layer ndash TCP

12

BB

TCP Round Trip Time and Timeout

Q how to set TCP how to set TCP timeout valuetimeout value

longer than RTT note RTT will vary

too short premature timeout unnecessary

retransmissions too long slow

reaction to segment loss

RTT = round trip time

Q how to estimate RTThow to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

cumulatively ACKed segments

SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent

measurementsmeasurements not just current SampleRTT

Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet

Transport Layer ndash TCP

13

BB

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially

fast typical value of x 0125 (RFC 2988)

Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025

Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)

DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 6: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

6

BB

Connection Oriented Transport Connection Oriented Transport TCPTCP

TCP Segment Structure SEQ and ACK numbers Calculating the Timeout Interval The Simplified TCP Sender ACK Generation Recommendation (RFC 1122 RFC 2581)

Interesting Transmission Scenarios Flow Control TCP Connection Management

Transport Layer ndash TCP

7

BB

TCP segment structureTCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

URGent data ptrchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection established

(setup tear downcommands)

bytes thercvr is willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

In practice PSH URG and the Urgent Data Pointer are not used

HeaderHeader

We can view these teeny-weenydetails using Ethereal

Transport Layer ndash TCP

8

BB

ExampleExample

Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection

AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00

TCP constructs 500 segments out of the data stream

500000 bytes1000 bytes = 500 segments

Transport Layer ndash TCP

9

BB

TCP sequence s and ACKs

Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number

RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc

ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)

00 1 2 3 4 999 10001000 1001 10021999

Segment 1 Segment 2

Transport Layer ndash TCP

10

BB

TCP sequence s and ACKs

Q how receiver handles out-of-order segments A TCP specs does does

notnot say - decide when implementing

Host A Host BSeq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

C

host ACKsreceipt

of echoedC

host ACKsreceipt of

C echoesback C

time

simple telnet scenario (with echo on)

Assuming that the starting sequence numbers for Host A

and Host B are 4242 and 7979 respectively

Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data

Irsquom sending data starting at seq num=42

serverclient

Transport Layer ndash TCP

11

BB

Yet another server echo Yet another server echo exampleexample

Host A Host B

Seq=42 ACK=79 data = lsquoHellorsquo

Seq=79 ACK=47 data = lsquoHellorsquo

Seq=47 ACK=84 data = lsquo200rsquo

UsertypesHello

host ACKsreceipt

of echoedHello

send something else

host ACKsreceipt ofHello

echoes back Hello

time

Host Aseq=42ack=79

seq=47ack=84

Host B

seq=79

ack=47

seq=84

ack=50

Seq=84 ACK=50 data = lsquo200rsquo

ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the

next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive

Transport Layer ndash TCP

12

BB

TCP Round Trip Time and Timeout

Q how to set TCP how to set TCP timeout valuetimeout value

longer than RTT note RTT will vary

too short premature timeout unnecessary

retransmissions too long slow

reaction to segment loss

RTT = round trip time

Q how to estimate RTThow to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

cumulatively ACKed segments

SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent

measurementsmeasurements not just current SampleRTT

Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet

Transport Layer ndash TCP

13

BB

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially

fast typical value of x 0125 (RFC 2988)

Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025

Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)

DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 7: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

7

BB

TCP segment structureTCP segment structure

source port dest port

32 bits

applicationdata

(variable length)

sequence number

acknowledgement numberrcvr window size

URGent data ptrchecksum

FSRPAUheadlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection established

(setup tear downcommands)

bytes thercvr is willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

In practice PSH URG and the Urgent Data Pointer are not used

HeaderHeader

We can view these teeny-weenydetails using Ethereal

Transport Layer ndash TCP

8

BB

ExampleExample

Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection

AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00

TCP constructs 500 segments out of the data stream

500000 bytes1000 bytes = 500 segments

Transport Layer ndash TCP

9

BB

TCP sequence s and ACKs

Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number

RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc

ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)

00 1 2 3 4 999 10001000 1001 10021999

Segment 1 Segment 2

Transport Layer ndash TCP

10

BB

TCP sequence s and ACKs

Q how receiver handles out-of-order segments A TCP specs does does

notnot say - decide when implementing

Host A Host BSeq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

C

host ACKsreceipt

of echoedC

host ACKsreceipt of

C echoesback C

time

simple telnet scenario (with echo on)

Assuming that the starting sequence numbers for Host A

and Host B are 4242 and 7979 respectively

Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data

Irsquom sending data starting at seq num=42

serverclient

Transport Layer ndash TCP

11

BB

Yet another server echo Yet another server echo exampleexample

Host A Host B

Seq=42 ACK=79 data = lsquoHellorsquo

Seq=79 ACK=47 data = lsquoHellorsquo

Seq=47 ACK=84 data = lsquo200rsquo

UsertypesHello

host ACKsreceipt

of echoedHello

send something else

host ACKsreceipt ofHello

echoes back Hello

time

Host Aseq=42ack=79

seq=47ack=84

Host B

seq=79

ack=47

seq=84

ack=50

Seq=84 ACK=50 data = lsquo200rsquo

ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the

next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive

Transport Layer ndash TCP

12

BB

TCP Round Trip Time and Timeout

Q how to set TCP how to set TCP timeout valuetimeout value

longer than RTT note RTT will vary

too short premature timeout unnecessary

retransmissions too long slow

reaction to segment loss

RTT = round trip time

Q how to estimate RTThow to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

cumulatively ACKed segments

SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent

measurementsmeasurements not just current SampleRTT

Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet

Transport Layer ndash TCP

13

BB

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially

fast typical value of x 0125 (RFC 2988)

Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025

Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)

DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 8: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

8

BB

ExampleExample

Suppose that a process in Host A wants to send a stream of data to a process in Host B over a TCP connection

AssumeData stream file consisting of 500000 bytes500000 bytesMSS 1000 bytes1000 bytesFirst byte of data stream numbered as 00

TCP constructs 500 segments out of the data stream

500000 bytes1000 bytes = 500 segments

Transport Layer ndash TCP

9

BB

TCP sequence s and ACKs

Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number

RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc

ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)

00 1 2 3 4 999 10001000 1001 10021999

Segment 1 Segment 2

Transport Layer ndash TCP

10

BB

TCP sequence s and ACKs

Q how receiver handles out-of-order segments A TCP specs does does

notnot say - decide when implementing

Host A Host BSeq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

C

host ACKsreceipt

of echoedC

host ACKsreceipt of

C echoesback C

time

simple telnet scenario (with echo on)

Assuming that the starting sequence numbers for Host A

and Host B are 4242 and 7979 respectively

Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data

Irsquom sending data starting at seq num=42

serverclient

Transport Layer ndash TCP

11

BB

Yet another server echo Yet another server echo exampleexample

Host A Host B

Seq=42 ACK=79 data = lsquoHellorsquo

Seq=79 ACK=47 data = lsquoHellorsquo

Seq=47 ACK=84 data = lsquo200rsquo

UsertypesHello

host ACKsreceipt

of echoedHello

send something else

host ACKsreceipt ofHello

echoes back Hello

time

Host Aseq=42ack=79

seq=47ack=84

Host B

seq=79

ack=47

seq=84

ack=50

Seq=84 ACK=50 data = lsquo200rsquo

ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the

next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive

Transport Layer ndash TCP

12

BB

TCP Round Trip Time and Timeout

Q how to set TCP how to set TCP timeout valuetimeout value

longer than RTT note RTT will vary

too short premature timeout unnecessary

retransmissions too long slow

reaction to segment loss

RTT = round trip time

Q how to estimate RTThow to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

cumulatively ACKed segments

SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent

measurementsmeasurements not just current SampleRTT

Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet

Transport Layer ndash TCP

13

BB

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially

fast typical value of x 0125 (RFC 2988)

Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025

Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)

DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 9: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

9

BB

TCP sequence s and ACKs

Sequence Numbers (s) byte stream number of first byte in segments data Do not necessarily start from 0 use random initial number

RRbull Segment 1 0 + RRbull Segment 2 1000 + RR etc

ACKs (acknowledgment) Seq of next byte expected from other side (last byte +1) Cumulative ACK If received segment 1 waits for segment 2 Eg Ack=1000 + R (received up to 999th byte)

00 1 2 3 4 999 10001000 1001 10021999

Segment 1 Segment 2

Transport Layer ndash TCP

10

BB

TCP sequence s and ACKs

Q how receiver handles out-of-order segments A TCP specs does does

notnot say - decide when implementing

Host A Host BSeq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

C

host ACKsreceipt

of echoedC

host ACKsreceipt of

C echoesback C

time

simple telnet scenario (with echo on)

Assuming that the starting sequence numbers for Host A

and Host B are 4242 and 7979 respectively

Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data

Irsquom sending data starting at seq num=42

serverclient

Transport Layer ndash TCP

11

BB

Yet another server echo Yet another server echo exampleexample

Host A Host B

Seq=42 ACK=79 data = lsquoHellorsquo

Seq=79 ACK=47 data = lsquoHellorsquo

Seq=47 ACK=84 data = lsquo200rsquo

UsertypesHello

host ACKsreceipt

of echoedHello

send something else

host ACKsreceipt ofHello

echoes back Hello

time

Host Aseq=42ack=79

seq=47ack=84

Host B

seq=79

ack=47

seq=84

ack=50

Seq=84 ACK=50 data = lsquo200rsquo

ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the

next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive

Transport Layer ndash TCP

12

BB

TCP Round Trip Time and Timeout

Q how to set TCP how to set TCP timeout valuetimeout value

longer than RTT note RTT will vary

too short premature timeout unnecessary

retransmissions too long slow

reaction to segment loss

RTT = round trip time

Q how to estimate RTThow to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

cumulatively ACKed segments

SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent

measurementsmeasurements not just current SampleRTT

Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet

Transport Layer ndash TCP

13

BB

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially

fast typical value of x 0125 (RFC 2988)

Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025

Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)

DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 10: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

10

BB

TCP sequence s and ACKs

Q how receiver handles out-of-order segments A TCP specs does does

notnot say - decide when implementing

Host A Host BSeq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

Usertypes

C

host ACKsreceipt

of echoedC

host ACKsreceipt of

C echoesback C

time

simple telnet scenario (with echo on)

Assuming that the starting sequence numbers for Host A

and Host B are 4242 and 7979 respectively

Send me the bytes from 43 onwardACK is being piggy-backed onserver-to-client data

Irsquom sending data starting at seq num=42

serverclient

Transport Layer ndash TCP

11

BB

Yet another server echo Yet another server echo exampleexample

Host A Host B

Seq=42 ACK=79 data = lsquoHellorsquo

Seq=79 ACK=47 data = lsquoHellorsquo

Seq=47 ACK=84 data = lsquo200rsquo

UsertypesHello

host ACKsreceipt

of echoedHello

send something else

host ACKsreceipt ofHello

echoes back Hello

time

Host Aseq=42ack=79

seq=47ack=84

Host B

seq=79

ack=47

seq=84

ack=50

Seq=84 ACK=50 data = lsquo200rsquo

ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the

next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive

Transport Layer ndash TCP

12

BB

TCP Round Trip Time and Timeout

Q how to set TCP how to set TCP timeout valuetimeout value

longer than RTT note RTT will vary

too short premature timeout unnecessary

retransmissions too long slow

reaction to segment loss

RTT = round trip time

Q how to estimate RTThow to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

cumulatively ACKed segments

SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent

measurementsmeasurements not just current SampleRTT

Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet

Transport Layer ndash TCP

13

BB

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially

fast typical value of x 0125 (RFC 2988)

Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025

Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)

DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 11: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

11

BB

Yet another server echo Yet another server echo exampleexample

Host A Host B

Seq=42 ACK=79 data = lsquoHellorsquo

Seq=79 ACK=47 data = lsquoHellorsquo

Seq=47 ACK=84 data = lsquo200rsquo

UsertypesHello

host ACKsreceipt

of echoedHello

send something else

host ACKsreceipt ofHello

echoes back Hello

time

Host Aseq=42ack=79

seq=47ack=84

Host B

seq=79

ack=47

seq=84

ack=50

Seq=84 ACK=50 data = lsquo200rsquo

ACK tells about up to what byte ACK tells about up to what byte has been receivedhas been received and what is the and what is the

next startingnext starting byte the host is byte the host is expecting to receiveexpecting to receive

Transport Layer ndash TCP

12

BB

TCP Round Trip Time and Timeout

Q how to set TCP how to set TCP timeout valuetimeout value

longer than RTT note RTT will vary

too short premature timeout unnecessary

retransmissions too long slow

reaction to segment loss

RTT = round trip time

Q how to estimate RTThow to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

cumulatively ACKed segments

SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent

measurementsmeasurements not just current SampleRTT

Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet

Transport Layer ndash TCP

13

BB

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially

fast typical value of x 0125 (RFC 2988)

Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025

Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)

DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 12: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

12

BB

TCP Round Trip Time and Timeout

Q how to set TCP how to set TCP timeout valuetimeout value

longer than RTT note RTT will vary

too short premature timeout unnecessary

retransmissions too long slow

reaction to segment loss

RTT = round trip time

Q how to estimate RTThow to estimate RTT SampleRTT measured time

from segment transmission until ACK receipt ignore retransmissions

cumulatively ACKed segments

SampleRTT will vary we would want estimated RTT to be smoother use several recent use several recent

measurementsmeasurements not just current SampleRTT

Main IssueMain Issue How long is the sender willing to wait before re-transmitting the packet

Transport Layer ndash TCP

13

BB

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially

fast typical value of x 0125 (RFC 2988)

Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025

Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)

DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 13: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

13

BB

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x) EstimatedRTT + x SampleRTTEstimatedRTT = (1-x) EstimatedRTT + x SampleRTT

Exponential weighted moving average influence of given sample decreases exponentially

fast typical value of x 0125 (RFC 2988)

Setting the timeoutSetting the timeout EstimatedRTT plus safety marginsafety margin large variation in EstimatedRTT -gt larger safety margin recommended value of x 025x 025

Timeout = EstimatedRTT + (4 Deviation)Timeout = EstimatedRTT + (4 Deviation)

DeviationDeviation = (1-x) Deviation + x |SampleRTT-EstimatedRTT|

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 14: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

14

BB

EstimatedRTT = 0875 EstimatedRTT + 0125 SampleRTT

EstimatedRTT after the receipt of the ACK of segment 1EstimatedRTT = RTT for Segment 1 = 002746 second

EstimatedRTT after the receipt of the ACK of segment 2EstimatedRTT = 0875 002746 + 0125 0035557 = 00285

EstimatedRTT after the receipt of the ACK of segment 3EstimatedRTT = 0875 00285 + 0125 0070059 = 00337

EstimatedRTT after the receipt of the ACK of segment 4EstimatedRTT = 0875 00337+ 0125 011443 = 00438

EstimatedRTT after the receipt of the ACK of segment 5EstimatedRTT = 0875 00438 + 0125 013989 = 00558

EstimatedRTT after the receipt of the ACK of segment 6EstimatedRTT = 0875 00558 + 0125 018964 = 00725

Sample CalculationsSample Calculations

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 15: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

15

BB

RTT Samples and RTT estimatesRTT Samples and RTT estimates

300

250

200

150

100 time

Estimated RTT

Sample RTT

RT

T (

mse

c)

The variations in the The variations in the SampleRTT SampleRTT are are smoothed out in the computation of thesmoothed out in the computation of the

EstimatedRTTEstimatedRTT

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 16: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

16

BB

An Actual RTT estimationAn Actual RTT estimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RT

T (m

illis

eco

nds)

SampleRTT Estimated RTT

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 17: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

17

BB

Simplified TCP sender assuming

waitfor

event

waitfor

event

event data received from application above

event timer timeout for segment with seq number y

event ACK receivedwith ACK number y

create send segment

retransmit segment

process ACK

- one way data transfer- no flow congestion control

FSM of TCP for Reliable Data Transfer FSM of TCP for Reliable Data Transfer

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 18: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

18

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 If (timer is currently not running) start timer for segment

nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout 11 retransmit not-yet-ACKed segment with smallest Seq 12 Start timer13 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 sendbase = y 17 If (there are currently any not-yet-ACKed segments)18 start timer19 20 end of loop forever

Associated with the oldest unACKed

segment

SIMPLIFIED TCPSIMPLIFIED TCP SENDER

AssumptionsAssumptions bull sender is not constrained by TCP flow or congestion controlbull that data from above is less than MSS in sizebull that data transfer is in one direction only

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 19: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

20

BB

00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 0203 loop (forever) 04 switch(event) 05 event data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event ACK received with ACK field value of y 15 if (y gt sendbase) cumulative ACK of all data up to y 16 cancel all timers for segments with sequence numbers lt y 17 sendbase = y 18 19 else a duplicate ACK for already ACKed segment 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y is 3) 22 perform TCP fast retransmission 23 resend segment with sequence number y 24 restart timer for segment y 25 26 end of loop forever

TCPTCP with MODIFICATIONSwith MODIFICATIONS SENDER

Why wait for the timeout to expire when consecutive

ACKs can be used to indicate a lost segment

With Fast Retransmit

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 20: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

21

BB

TCP ACK generation [RFC 1122 RFC 2581]

Event

in-order segment arrival no gapseverything else already ACKed

in-order segment arrival no gaps one delayed ACK pending (due to action 1)

out-of-order segment arrivalwith higher than expect seq - a gap is detected

arrival of segment that partially or completely fills gap

TCP Receiver action

Delay sending the ACK Wait up to 500msfor next segment If next segment does not arrive in this interval send ACK

immediately send a singlecumulative ACK

send duplicate ACK indicating seq of next expected byte

Immediately send an ACK if segment startsat lower end of gap

1

2

3

4

Receiver does not discard out-of-order segmentsReceiver does not discard out-of-order segments

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 21: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

22

BB

TCP Interesting Scenarios

Host A

Seq=92 8 bytes data

ACK=100

loss

tim

eout

time lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

Host A

Seq=100 20 bytes data

ACK=100

Seq=

92

tim

eout

time premature timeoutcumulative ACKs

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

ACK=120

Retransmission due to lost ACKSegment with Seq=100 not retransmitted

Timer is restarted here for Seq=92

Simplified TCP versionSimplified TCP version

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 22: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

23

BB

Host A

Seq=100 20 bytes data

ACK=100Seq=

92

tim

eout

time

Host B

ACK=120

Seq=92 8 bytes data

Xloss

Cumulative ACK avoids retransmission of the first segment

TCP Retransmission ScenarioTCP Retransmission Scenario

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 23: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

24

BB

TCP Modifications Doubling the Timeout IntervalDoubling the Timeout Interval

Provides a limited form of congestion control

Timer expiration is more likely caused by congestion in the network

TimeoutInterval = 2 TimeoutIntervalPrevious

TCP acts more politely by increasing the TimeoutInterval causing the sender to retransmit after longer and longer intervals

Congestion may get worse if sources

continue to retransmit packets

persistently

After ACK is received After ACK is received TimeoutInterval is derived TimeoutInterval is derived from most recent EstimatedRTT from most recent EstimatedRTT and DevRTTand DevRTT

Others check RFC 2018 ndash selective ACK

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 24: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

25

BB

receiverreceiver explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindowRcvWindow field

in TCP segment

sendersender keeps the amount of transmitted unACKed data less thanless than most recently received RcvWindowRcvWindow

sender wont overrun

receivers buffer bytransmitting too

much too fast

flow controlflow control

receiver buffering

RcvBufferRcvBuffer = size of TCP Receive Buffer

RcvWindowRcvWindow = amount of spare room in Buffer

TCP Flow ControlTCP Flow Control

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 25: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

26

BB

FLOW CONTROL FLOW CONTROL ReceiverReceiver

04060 50100

LastByteRcvd

EXAMPLE HOST A sends a large file to HOST B

LastByteRead

Application Process

Data from IP

RcvBuffer

RcvWindow=RcvBuffer-[LastByteRcvd-LastByteRead]

RECEIVER HOST B ndash uses RcvWindow LastByteRcvd LastByteReadRcvWindow LastByteRcvd LastByteRead

Initially RcvWindow = RcvBuffer Application reads from the bufferHOST BHOST B tells HOST AHOST A how much spare roomhow much spare room it has in the connection buffer by placing its current value of RcvWindowRcvWindow in the receive window field of every segment it sends to HOST AHOST A

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 26: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

27

BB

FLOW CONTROL SenderSender

04060 50100

LastByteSent

EXAMPLE HOST A sends a large file to HOST B

LastByteACKed

Data

To ensure that HOST B does not overflow HOST A maintains throughout the connectionrsquos life that [LastByteSent-LastByteACKed] lt= RcvWindow

SENDER HOST A

ACKs ACKs from from Host Host BB

SENDER HOST A ndash uses RcvWindow RcvWindow of HostBof HostB LastByteSent LastByteACKed LastByteSent LastByteACKed

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 27: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

28

BB

FLOW CONTROLFLOW CONTROLSome issue to consider

RcvWindowRcvWindow ndash used by the connection to provide the flow control serviceflow control service

TCPTCP requires that requires that HOST AHOST A continue to sendcontinue to send segments with one segments with one data byte when HOST Brsquos receive data byte when HOST Brsquos receive window is window is 00 Such segments will be Such segments will be ACKedACKed by HOST B by HOST B Eventually the buffer will have Eventually the buffer will have some space and the some space and the ACKsACKs will will contain contain RcvWindow gt 0RcvWindow gt 0

What happens when the receive

buffer of HOST B is full full (that is when (that is when

RcvWindow=0)RcvWindow=0)

TCP sends a segment only when there is datadata or ACKACK to send Therefore the sender must maintain the connection lsquoalivealiversquo

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 28: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

29

BB

TCP Connection ManagementTCP Connection Management

Recall TCP sender receiver establish ldquoconnectionconnectionrdquo before exchanging data segments

Initialize TCP variables sequence numbers buffers flow control info (eg RcvWindow)

ClientClient is the connection initiator

In Java Socket clientSocket = new Socket(hostnameport number) connect ServerServer is contacted by client

In JavaSocket accept()

if (connectconnect(s (struct sockaddr )ampsin sizeof(sin)) = 0) printf(connect failedn) WSACleanup() exit(1)

ns = acceptaccept(s(struct sockaddr )(ampremoteaddr)ampaddrlen)

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 29: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

30

BB

TCP Connection Management

Three way handshakeStep 1 clientclient end system sends TCP

SYNSYN control segment to server (executed by TCP itself)

specifies initial seq number (isn)

Step 2 serverserver end system receives SYNSYN replies with SYNACKSYNACK control segment

ACKs received SYN allocates buffers specifies serverrsquos initial seq

number

Step 3 clientclient ACKsACKs the connection with

ACK=server_isn +1ACK=server_isn +1

allocates buffers sends SYN=0SYN=0

Connection established

Client

AcceptAccept (SYN=1

seq=server_isnack=client_isn+1)

time

Server

ConnectConnect (SYN=1 seq=client_isn)

ACK (SYN=0

seq=client_isn+1ack=server_isn+1)

Establishing a connection

This is what happens when we create a socket for

connection to a server

After establishing the connection the client can receive segments with app-generated data (SYN=0)(SYN=0)

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 30: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

31

BB

TCP Connection Management (cont)

Closing a connection

client closes socket

closesocket(s)closesocket(s)

Java clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

How TCP connection is established and torn down

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 31: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

32

BB

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

Enters timed wait - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

Note with small modification can handle simultaneous FINs

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 32: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

33

BB

TCP Connection Management (cont)

TCP client lifecycle

TCP server lifecycle

Used in case ACK gets lost It is implementation-dependent (eg 30

seconds 1 minute 2 minutes

Connection formally closes ndash all resources (eg port numbers) are

released

1

2

3

4

5

6

7

8

9

10

11

12

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 33: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

34

BB

End of Flow Control and Error Control

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 34: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

35

BB

Flow Control vs Congestion ControlFlow Control vs Congestion ControlSimilar actions are taken but for very different reasons

Flow ControlFlow Controlbull point-to-point traffic between sender and receiverbull speed matching service matching the rate at which the sender is sending against the rate at which the receiving application is readingbull prevents Receiver Buffer from overflowing

CongestionCongestion ndash happens when there are too many sources attempting to send data at too high a rate for the routers along the path

Congestion ControlCongestion Controlbull service that makes sure that the routers between End Systems are able to carry the offered trafficbull prevents routers from overflowing

Same course of action Throttling of the sender

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 35: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

36

BB

CongestionCongestion Informally too many sources sending too

much data too fast for network to handle different from flow control Manifestations

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

a top-10 problem

Principles of Congestion ControlPrinciples of Congestion Control

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 36: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

37

BB

Approaches towards congestion control

End-to-end congestion control

no explicit feedback from network

congestion inferred by end-systems from observed packet loss amp delay

approach taken by TCP

Network-assisted congestion control

routers provide feedback to End Systems in the form of single bit indicating

link congestion (SNA DECbit TCPIP ECN ATM ABR)

explicit transmission rate the sender should send at

1 2

Two broad approaches towards congestion control

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 37: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

38

BB

TCP Congestion ControlTCP Congestion ControlHow TCP sender limits the rate at which it sends traffic into its connection

SENDER

(Amount of unACKed data)SENDER lt min(CongWin RcvWindow)

LastByteSent - LastByteACKed

Indirectly limits the senderrsquos send rateAssumptions

bull TCP receive buffer is very large ndash no RcvWindow constraint Amt of unACKed data at sender is solely limited by CongWin

bull Packet loss delay amp packet transmission delay are negligible

Sending rate (approx) CongWinRTT

By adjusting CongWin sender can therefore adjust the

rate at which it sends data into its connection

New variable ndash Congestion

Window

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 38: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

39

BB

TCP Congestion ControlTCP Congestion ControlTCP uses ACKs to trigger (ldquoclockrdquo) its increase in congestion window size ndash ldquoself-clockingrdquo

Arrival of ACKs ndash indication to the sender that all is well

1 Slow Rate

bull Congestion window will be increased at a relatively slow rate

2 High rate

bull Congestion window will be increased more quickly

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 39: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

40

BB

TCP Congestion ControlTCP Congestion ControlHow TCP perceives that there is congestion on the path

ldquoLoss Eventrdquo ndash when there is excessive congestion router buffers along the path overflows causing datagrams to be dropped which in turn results in a ldquoloss eventrdquo at the sender

1 Timeout

bull no ACK is received after segment loss

2 Receipt of three duplicate ACKs

bull segment loss is followed by three ACKs received at the sender

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 40: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

41

BB

TCP Congestion Control details

sender limits transmission LastByteSent-LastByteAcked

cwnd

roughly

cwnd is dynamic function of perceived network congestion

How does sender perceive congestion

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Three mechanisms1 AIMD

2 slow start

3 conservative after timeout events

rate = cwnd

RTT Bytessec

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 41: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

42

BB

TCP congestion avoidance congestion avoidance additive increase multiplicative decreaseadditive increase multiplicative decrease

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

approach increase transmission rate (window size) probing for usable bandwidth until loss occurs additive increase increase cwnd by 1 MSS every RTT until

loss is detected multiplicative decrease cut cwnd in half after loss

timecwnd

co

nge

stio

n w

indo

w s

ize

saw toothbehavior probingfor bandwidth

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 42: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

43

BB

TCP Slow Start Slow Start when connection begins

increase rate exponentially until first loss event initially cwndcwnd = 1 MSS double cwndcwnd every RTT done by incrementing cwndcwnd by 1 MSS by 1 MSS for every ACK received

summary initial rate is slow but ramps up exponentially fast (doubling of the sending rate every RTT)

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 43: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

44

BB

Refinement inferring lossRefinement inferring loss

after 3 dup ACKsafter 3 dup ACKs cwndcwnd is cut in halfhalf window then grows

linearlylinearly butbut after timeout after timeout

eventevent cwndcwnd is is set to 1 MSS1 MSS window then grows

exponentiallyexponentially Up to a thresholdUp to a threshold

then grows linearlylinearly

3 dup ACKs 3 dup ACKs indicates network capable of delivering some segments

timeouttimeout indicates a ldquomore alarmingrdquo congestion scenario

Philosophy

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 44: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

45

BB

Refinement

Q when should the exponential increase switch to linear

A when cwndcwnd gets to 1212 of its value before timeout

Implementation variable ssthresh (slow-start threshold)ssthresh (slow-start threshold) on loss eventon loss event ssthreshssthresh is set to 1212 of cwndcwnd just

before loss event

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 45: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

46

BB

TCP Sender Congestion ControlSTATE EVENT TCP SENDER Congestion-

Control ActionCommentary

SLOW START (SS)

ACK receipt for previously unACKed data

CongWin = CongWin + MSSIf(CongWin gt Threshold) set state to ldquoCongestion Avoidancerdquo

Resulting in a doubling of CongWin every RTT

Congestion Avoidance (CA)

ACK receipt for previously unACKed data

CongWin = CongWin + MSS (MSSCongWin)

Additive increase resulting in increasing of CongWin by 1 MSS every RTT

SS or CA Loss event detected by triple duplicate ACK

Threshold = CongWin 2CongWin = ThresholdSet state to ldquoCongestion Avoidancerdquo

Fast recovery implementing multiplicative decrease CongWin will not drop below 1 MSS

SS or CA Timeout Threshold = CongWin 2CongWin = 1 MSSSet state to ldquoSlow Startrdquo

Enter Slow Start

SS or CA Duplicate ACK Increment duplicate ACK count for segment being ACKed

CongWin and Threshold not changed

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 46: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

47

BB

Summary TCP Congestion Control

timeoutssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd gt ssthresh

congestionavoidance

cwnd = cwnd + MSS (MSScwnd)dupACKcount = 0transmit new segment(s) as allowed

new ACK

dupACKcount++

duplicate ACK

fastrecovery

cwnd = cwnd + MSStransmit new segment(s) as allowed

duplicate ACK

ssthresh= cwnd2cwnd = ssthresh + 3

retransmit missing segment

dupACKcount == 3

timeoutssthresh = cwnd2cwnd = 1 dupACKcount = 0retransmit missing segment

ssthresh= cwnd2cwnd = ssthresh + 3 MSSretransmit missing segment

dupACKcount == 3cwnd = ssthreshdupACKcount = 0

New ACK

slow start

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s) as allowed

new ACKdupACKcount++

duplicate ACK

cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0

NewACK

NewACK

NewACK

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 47: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

48

BB

Congestion control

TCPrsquos Congestion Control Service

CLIENTSERVER

Problem Gridlock sets-in when there is packet loss due to router congestion

forces the End Systems to decrease the rate at which packets are sentdecrease the rate at which packets are sent during periods of congestion

The sending systemrsquos packet is lost due to congestion and is alerted when it stops receiving ACKs of packets sent

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 48: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

49

BBTransport Layer 3-49

Macroscopic Description of TCP throughput

whatrsquos the average throughout of TCP as a function of window size and RTT ignore slow start (typically very short phases)

let W be the window size when loss occurs when window is W throughput is WRTT just after loss window drops to W2 throughput to

W2RTT Throughput increases linearly (by MSSRTT every

RTT) Average Throughput 75 WRTT

(Based on Idealised model for the steady-state dynamics of TCP)

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 49: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer 3-50

TCP FuturesTCP Futures TCP over ldquolong fat pipesrdquo Example GRID computing application 1500-byte segments 100ms RTT desired

throughput of 10 Gbps requires window size W = 83333 in-flight

segments Throughput in terms of loss rate

L = 210-10 ndash a very small loss rate (1 loss event every 5 billion segments)

new versions of TCP is needed for high-speed environments

LRTT

MSS221

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 50: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

51

BB

TCP FairnessTCP Fairness

Fairness goal if NN TCP sessions share same bottleneck link each should get an average transmission rate of RNRN an equal share of the linkrsquos bandwidth

TCP connection 1

bottleneckrouter

capacity RR

TCP connection 2

Go to Summary of TCP Congestion Control

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 51: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

52

BB

Analysis of 2 connections sharing a linkAnalysis of 2 connections sharing a link

Link with transmission rate of RR

Each connection have the same MSSMSS RTTRTT

No other TCP connections or UDP datagrams traverse the shared link

Ignore slow start phase of TCP

Operating in congestion-avoidance modecongestion-avoidance mode (linear increase phase)

Goal adjust sending rate of the two connections to allow for equal equal bandwidth sharingbandwidth sharing

AssumptionsAssumptions

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 52: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

53

BB

Why is TCP fairWhy is TCP fair

Two competing sessions Additive increase gives slope of 1 as throughout increases multiplicative decrease decreases throughput

proportionally

RR

equal bandwidth shareequal bandwidth share

Connection 1 throughput

Connect

ion 2

thro

ughput

congestion avoidance additive increaseloss decrease window by factor of 2

RR

A point on the graph depicts the amount of link bandwidth

jointly jointly consumedconsumed by

the connections

Full bandwidth utilisation line

We can viewWe can view a simulation a simulation

on thison this

View SimulationView Simulation

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 53: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

54

BB

The End

The next succeeding slides are just for additional reading

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 54: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

55

BB

TCP Latency ModelingTCP Latency Modeling

In practice clientserver applications with smaller RTT gets the In practice clientserver applications with smaller RTT gets the available bandwidth more quickly as it becomes free Therefore they available bandwidth more quickly as it becomes free Therefore they have higher throughputshave higher throughputs

Multiple parallel TCP connection allows one application to get a bigger Multiple parallel TCP connection allows one application to get a bigger share of the bandwidthshare of the bandwidth

Multiple End Systems sharing a link

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

Loop holes in TCPLoop holes in TCP

1 TCP connection

1 TCP connection

1 TCP connection

3 TCP connections

Multithreading implementationMultithreading implementation

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 55: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

56

BB

TCP latency modelingTCP latency modeling

TCP connection establishment time data transfer delay Actual data transmission time

Two cases to considerTwo cases to consider WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

WSR lt RTT + SRWSR lt RTT + SR Sender has to wait for an ACK after a windowrsquos worth of data sent

QQ How long How long does it take to does it take to

receive an object receive an object from a Web from a Web

serverserver

No data transfer delayNo data transfer delay

Therersquos data transfer delayTherersquos data transfer delay

the time from when the client initiates a TCP connection until when the client the time from when the client initiates a TCP connection until when the client receives the requested object in its entiretyreceives the requested object in its entirety

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 56: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

57

BB

TCP Latency ModelingTCP Latency Modeling

Network is uncongested with one link between end systems of rate RR

CongWinCongWin (fixed) determines the amount of data that can be sent

No packet loss no packet corruption no retransmissions required

Header overheads are negligible

File to sendFile to send = integer number of segments of size MSS

Connection establishment request messages ACKs TCP connection-establishment segments have negligible transmission timesnegligible transmission times

CLIENT SERVER

FILEFILE

Initial ThresholdThreshold of TCP congestion mechanism is very big

R R bps ndash bps ndash linkrsquos transmission ratelinkrsquos transmission rate

AssumptionsAssumptionsOO - - Size of objectSize of object in in bitsbits

SS ndash ndash number of bits ofnumber of bits of MSS MSS (max segment size)(max segment size)

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 57: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

58

BB

TCP latency ModelingTCP latency Modeling

Case 1 latency = 2RTT + OR

K = Number of Windows of data that cover the object

K = OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 1 Case 1 WSR gt RTT + SRWSR gt RTT + SR

An ACK for the first segment in window returns to the Sender before a windowrsquos worth of data is sent

Number of segmentsRounded up to the nearest integer

Assume W=4 segmentsAssume W=4 segments

eg O=256bits S=32bits W=4

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 58: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

59

BB

TCP latency ModelingTCP latency Modeling

Case 2 latency = 2RTT + OR + (K-1)[SR + RTT - WSR]

Number of Windows of data that cover the objectK= OWS

Case Analysis Case Analysis STATICSTATIC CONGESTION WINDOW CONGESTION WINDOW

Case 2 Case 2 WSR lt RTT + SRWSR lt RTT + SR

Sender has to wait for an ACK after a windowrsquos worth of data sent

STALLED STALLED PERIODPERIOD

If there are k windows sender willbe stalled (k-1) times

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 59: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

60

BB

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

STALLED STALLED PERIODPERIOD

4 windows4 windows

OS=15

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 60: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

61

BB

bull LetLet K K be the be the number of windows that cover the objectnumber of windows that cover the objectbull We can express We can express KK in terms of the number of segments in the in terms of the number of segments in the

object as followsobject as follows

S

OkK k 110 222min

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1log

1logmin

12min

2

2

S

OK

S

OkkK

S

OkK k

Note

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 61: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

62

BB

bull From the time the server begins to transmit the From the time the server begins to transmit the kkth window window until the time the server receives an until the time the server receives an ACKACK for the for the first segment in the windowin the window

bull Transmission of Transmission of kkth window window ==

bull Stall Time Stall Time ==

bull Latency Latency ==

12

k

R

S

12

k

R

SRTT

R

S

RTTR

S

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

1K

1

122k

k

R

SRTT

R

S

R

ORTT

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 62: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

63

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 63: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

64

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

11log2

RS

RTTQ

bull The actual number of times that the server stalls is

P = min Q K-1

R

S

R

SRTTP

R

ORTT P )12(2Latency

bull Closed-form expression for the latency

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 64: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

65

BB

bull Let QQ be the number of times the server would stall if the object contained an infinite number of segments

Case Analysis Case Analysis DYNAMICDYNAMIC CONGESTION WINDOW CONGESTION WINDOW

2

1

RTTRO

P

encyMinimumLat

Latency

Slow start will not significantly increase latency if RTT ltlt OR

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo
Page 65: Transport Layer – TCP 1 BBBB TCP Flow Control, Congestion Control, Connection Management, etc. Part 2

Transport Layer ndash TCP

66

BB

httpwww1csewustledu~jaincis788-97ftptcp_over_atmindexhtmatm-features

  • TCP Futures TCP over ldquolong fat pipesrdquo