transport layer cs 3516 – computer networks. chapter 3: transport layer goals: understand...

106
Transport Layer CS 3516 – Computer Networks

Upload: zachery-boxell

Post on 01-Apr-2015

245 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Transport Layer

CS 3516 ndash Computer Networks

Chapter 3 Transport LayerGoalsbull Understand

principles behind transport layer servicesndash Multiplexing

demultiplexingndash Reliable data

transferndash Flow controlndash Congestion control

bull Learn about transport layer protocols in the Internetndash UDP connectionless

transportndash TCP connection-oriented

transportndash TCP congestion control

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Transport Services and Protocols

bull Provide logical communication between app processes running on different hosts

bull Transport protocols run in end systems ndash send side breaks app

messages into segments passes to network layer

ndash receive side reassembles segments into messages passes to app layer

bull More than one transport protocol available to appsndash Internet TCP and UDP

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

Transport vs Network layer

bull network layer logical communication between hosts

bull transport layer logical communication between processes ndash relies on enhances network layer services

Household analogy12 kids sending letters to

12 kidsbull processes = kidsbull app messages = letters

in envelopesbull hosts = housesbull transport protocol = Ann

and Bill (collect mail from siblings)

bull network-layer protocol = postal service

Internet Transport-layer Protocols

bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup

bull unreliable unordered delivery UDPndash no-frills extension of

ldquobest-effortrdquo IP

bull services not available ndash delay guaranteesndash bandwidth guarantees

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order

to appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection

establishment (which can add delay)

bull simple no connection state at sender receiver

bull small segment headerbull no congestion control

UDP can blast away as fast as desired

UDP more

bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific

error recovery

source port dest port

32 bits

Applicationdata (message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Senderbull treat segment contents

as sequence of 16-bit integers

bull checksum addition (1rsquos complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of

received segmentbull check if computed

checksum equals checksum field valuendash NO - error detectedndash YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 2: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Chapter 3 Transport LayerGoalsbull Understand

principles behind transport layer servicesndash Multiplexing

demultiplexingndash Reliable data

transferndash Flow controlndash Congestion control

bull Learn about transport layer protocols in the Internetndash UDP connectionless

transportndash TCP connection-oriented

transportndash TCP congestion control

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Transport Services and Protocols

bull Provide logical communication between app processes running on different hosts

bull Transport protocols run in end systems ndash send side breaks app

messages into segments passes to network layer

ndash receive side reassembles segments into messages passes to app layer

bull More than one transport protocol available to appsndash Internet TCP and UDP

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

Transport vs Network layer

bull network layer logical communication between hosts

bull transport layer logical communication between processes ndash relies on enhances network layer services

Household analogy12 kids sending letters to

12 kidsbull processes = kidsbull app messages = letters

in envelopesbull hosts = housesbull transport protocol = Ann

and Bill (collect mail from siblings)

bull network-layer protocol = postal service

Internet Transport-layer Protocols

bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup

bull unreliable unordered delivery UDPndash no-frills extension of

ldquobest-effortrdquo IP

bull services not available ndash delay guaranteesndash bandwidth guarantees

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order

to appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection

establishment (which can add delay)

bull simple no connection state at sender receiver

bull small segment headerbull no congestion control

UDP can blast away as fast as desired

UDP more

bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific

error recovery

source port dest port

32 bits

Applicationdata (message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Senderbull treat segment contents

as sequence of 16-bit integers

bull checksum addition (1rsquos complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of

received segmentbull check if computed

checksum equals checksum field valuendash NO - error detectedndash YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 3: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Transport Services and Protocols

bull Provide logical communication between app processes running on different hosts

bull Transport protocols run in end systems ndash send side breaks app

messages into segments passes to network layer

ndash receive side reassembles segments into messages passes to app layer

bull More than one transport protocol available to appsndash Internet TCP and UDP

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

Transport vs Network layer

bull network layer logical communication between hosts

bull transport layer logical communication between processes ndash relies on enhances network layer services

Household analogy12 kids sending letters to

12 kidsbull processes = kidsbull app messages = letters

in envelopesbull hosts = housesbull transport protocol = Ann

and Bill (collect mail from siblings)

bull network-layer protocol = postal service

Internet Transport-layer Protocols

bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup

bull unreliable unordered delivery UDPndash no-frills extension of

ldquobest-effortrdquo IP

bull services not available ndash delay guaranteesndash bandwidth guarantees

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order

to appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection

establishment (which can add delay)

bull simple no connection state at sender receiver

bull small segment headerbull no congestion control

UDP can blast away as fast as desired

UDP more

bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific

error recovery

source port dest port

32 bits

Applicationdata (message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Senderbull treat segment contents

as sequence of 16-bit integers

bull checksum addition (1rsquos complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of

received segmentbull check if computed

checksum equals checksum field valuendash NO - error detectedndash YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 4: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Transport Services and Protocols

bull Provide logical communication between app processes running on different hosts

bull Transport protocols run in end systems ndash send side breaks app

messages into segments passes to network layer

ndash receive side reassembles segments into messages passes to app layer

bull More than one transport protocol available to appsndash Internet TCP and UDP

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

Transport vs Network layer

bull network layer logical communication between hosts

bull transport layer logical communication between processes ndash relies on enhances network layer services

Household analogy12 kids sending letters to

12 kidsbull processes = kidsbull app messages = letters

in envelopesbull hosts = housesbull transport protocol = Ann

and Bill (collect mail from siblings)

bull network-layer protocol = postal service

Internet Transport-layer Protocols

bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup

bull unreliable unordered delivery UDPndash no-frills extension of

ldquobest-effortrdquo IP

bull services not available ndash delay guaranteesndash bandwidth guarantees

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order

to appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection

establishment (which can add delay)

bull simple no connection state at sender receiver

bull small segment headerbull no congestion control

UDP can blast away as fast as desired

UDP more

bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific

error recovery

source port dest port

32 bits

Applicationdata (message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Senderbull treat segment contents

as sequence of 16-bit integers

bull checksum addition (1rsquos complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of

received segmentbull check if computed

checksum equals checksum field valuendash NO - error detectedndash YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 5: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Transport vs Network layer

bull network layer logical communication between hosts

bull transport layer logical communication between processes ndash relies on enhances network layer services

Household analogy12 kids sending letters to

12 kidsbull processes = kidsbull app messages = letters

in envelopesbull hosts = housesbull transport protocol = Ann

and Bill (collect mail from siblings)

bull network-layer protocol = postal service

Internet Transport-layer Protocols

bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup

bull unreliable unordered delivery UDPndash no-frills extension of

ldquobest-effortrdquo IP

bull services not available ndash delay guaranteesndash bandwidth guarantees

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order

to appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection

establishment (which can add delay)

bull simple no connection state at sender receiver

bull small segment headerbull no congestion control

UDP can blast away as fast as desired

UDP more

bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific

error recovery

source port dest port

32 bits

Applicationdata (message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Senderbull treat segment contents

as sequence of 16-bit integers

bull checksum addition (1rsquos complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of

received segmentbull check if computed

checksum equals checksum field valuendash NO - error detectedndash YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 6: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Internet Transport-layer Protocols

bull reliable in-order delivery (TCP)ndash congestion control ndash flow controlndash connection setup

bull unreliable unordered delivery UDPndash no-frills extension of

ldquobest-effortrdquo IP

bull services not available ndash delay guaranteesndash bandwidth guarantees

applicationtransportnetworkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

logical end-end transport

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order

to appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection

establishment (which can add delay)

bull simple no connection state at sender receiver

bull small segment headerbull no congestion control

UDP can blast away as fast as desired

UDP more

bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific

error recovery

source port dest port

32 bits

Applicationdata (message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Senderbull treat segment contents

as sequence of 16-bit integers

bull checksum addition (1rsquos complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of

received segmentbull check if computed

checksum equals checksum field valuendash NO - error detectedndash YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 7: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order

to appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection

establishment (which can add delay)

bull simple no connection state at sender receiver

bull small segment headerbull no congestion control

UDP can blast away as fast as desired

UDP more

bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific

error recovery

source port dest port

32 bits

Applicationdata (message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Senderbull treat segment contents

as sequence of 16-bit integers

bull checksum addition (1rsquos complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of

received segmentbull check if computed

checksum equals checksum field valuendash NO - error detectedndash YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 8: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order

to appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection

establishment (which can add delay)

bull simple no connection state at sender receiver

bull small segment headerbull no congestion control

UDP can blast away as fast as desired

UDP more

bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific

error recovery

source port dest port

32 bits

Applicationdata (message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Senderbull treat segment contents

as sequence of 16-bit integers

bull checksum addition (1rsquos complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of

received segmentbull check if computed

checksum equals checksum field valuendash NO - error detectedndash YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 9: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

UDP User Datagram Protocol [RFC 768]

bull ldquono frillsrdquo ldquobare bonesrdquo Internet transport protocol

bull ldquobest effortrdquo service UDP segments may bendash lostndash delivered out of order

to appbull connectionless

ndash no handshaking between UDP sender receiver

ndash each UDP segment handled independently of others

Why is there a UDPbull no connection

establishment (which can add delay)

bull simple no connection state at sender receiver

bull small segment headerbull no congestion control

UDP can blast away as fast as desired

UDP more

bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific

error recovery

source port dest port

32 bits

Applicationdata (message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Senderbull treat segment contents

as sequence of 16-bit integers

bull checksum addition (1rsquos complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of

received segmentbull check if computed

checksum equals checksum field valuendash NO - error detectedndash YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 10: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

UDP more

bull Often used for streaming (videoaudio) or game appsndash loss tolerantndash rate sensitive

bull other UDP usesndash DNSndash SNMP

bull reliable transfer over UDP add reliability at application layerndash application-specific

error recovery

source port dest port

32 bits

Applicationdata (message)

UDP segment format

length checksumLength in

bytes of UDPsegmentincluding

header

UDP checksum

Senderbull treat segment contents

as sequence of 16-bit integers

bull checksum addition (1rsquos complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of

received segmentbull check if computed

checksum equals checksum field valuendash NO - error detectedndash YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 11: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

UDP checksum

Senderbull treat segment contents

as sequence of 16-bit integers

bull checksum addition (1rsquos complement sum) of segment contents

bull sender puts checksum value into UDP checksum field

Receiverbull compute checksum of

received segmentbull check if computed

checksum equals checksum field valuendash NO - error detectedndash YES - no error detected

But maybe errors nonetheless More later hellip

Goal detect ldquoerrorsrdquo (eg flipped bits) in transmitted segment

Internet Checksum Example

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 12: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Internet Checksum Example

bull Example add two 16-bit integers

1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

checksum

bull At receiver add 2 integers and checksum hellip should be all 1rsquos If not bit error (correction next)

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 13: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 14: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 15: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Principles of Reliable data transferbull important in app transport link layers

bull top-10 list of important networking topics

bull characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 16: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Principles of Reliable Data Transferbull important in app transport link layersbull top-10 list of important networking topics

bull Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

(Zoom

next

slid

e)

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 17: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Reliable Data Transfer Getting Started

sendside

receiveside

rdt_send() called from above (eg by app) Passed data to deliver to receiver upper layer

udt_send() called by rdtto transfer packet over unreliable channel to receiver

rdt_rcv() called when packet arrives on rcv-side of channel

deliver_data() called by rdt to deliver data to upper

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 18: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Reliable Data Transfer Getting Started

Wersquollbull Incrementally develop sender receiver

sides of reliable data transfer protocol (rdt)bull Consider only unidirectional data transfer

ndash but control info will flow on both directions

bull Use finite state machines (FSM) to specify sender receiver

state 1 state 2

event causing state transitionactions taken on state transition

state when in this ldquostaterdquo next state uniquely determined by next

event

eventactions

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 19: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt10 Reliable Transfer over a Reliable Channel

bull Underlying channel perfectly reliablendash no bit errorsndash no loss of packets

bull Separate FSMs for sender receiverndash sender sends data into underlying channelndash receiver read data from underlying channel

Wait for call from above packet = make_pkt(data)

udt_send(packet)

rdt_send(data)

extract (packetdata)deliver_data(data)

Wait for call from

below

rdt_rcv(packet)

sender receiverEasy

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 20: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

What if Taking a Message over Phone

bull Message is clearbull Message is garbled

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 21: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

What if Taking a Message over Phone

bull Message is clearndash Ok

bull Message is garbledndash Ask to repeatndash May not need whole message

bull In networks called Automatic Repeat reQuest (ARQ)ndash Need error detectionndash Receiver feedbackndash Retransmission

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 22: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt20 Channel with Bit Errors (no Loss)

bull Underlying channel may flip bits in packetndash Checksum to detect bit errors

bull The question how to recover from errorsndash acknowledgements (ACKs) receiver explicitly tells

sender that pkt received OKndash negative acknowledgements (NAKs) receiver

explicitly tells sender that pkt had errorsndash Sender retransmits pkt on receipt of NAK

bull New mechanisms in rdt20 (beyond rdt10)ndash Error detectionndash Receiver feedback control msgs (ACKNAK)

rcvrsender

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 23: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt20 FSM Specification

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 24: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt20 Operation with No Errors

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 25: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt20 Error Scenario

Wait for call from above

snkpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from

below

rdt_send(data)

L

sender

receiver

Sender sends one packet then waits for receiver response

stop and wait

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 26: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt20 Has a Fatal Flaw

Wait for call from above

sndpkt = make_pkt(data checksum)udt_send(sndpkt)

extract(rcvpktdata)deliver_data(data)udt_send(ACK)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

rdt_rcv(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp isNAK(rcvpkt)

udt_send(NAK)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

Wait for ACK or

NAK

Wait for call from below

sender

receiverrdt_send(data)

L

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 27: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicatebull How to handle duplicates

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 28: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt20 Has a Fatal Flaw

What happens if ACKNAK corruptedbull Sender doesnrsquot know what happened at receiverbull Canrsquot just retransmit possible duplicate

Handling duplicates bull Sender retransmits

current pkt if ACKNAK garbled

bull Sender adds sequence number to each pktndash Can use 1 bit (for now)

bull receiver discards (doesnrsquot deliver up) duplicate pkt

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 29: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt21 Sender Handles Garbled ACKNAKs

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

Wait for ACK or NAK 0

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) ||isNAK(rcvpkt) )

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isNAK(rcvpkt) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt)

Wait for call 1 from

above

Wait for ACK or NAK 1

LL

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 30: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

Rdt21 Receiver Handles Garbled ACKNAKs

Wait for 0 from below

sndpkt = make_pkt(NAK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq0(rcvpkt)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

ampamp has_seq1(rcvpkt) extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Wait for 1 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq0(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp not corrupt(rcvpkt) ampamp has_seq1(rcvpkt)

sndpkt = make_pkt(ACK chksum)udt_send(sndpkt)

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 31: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt21 Discussion

Senderbull seq added to pktbull two seq rsquos (01) will sufficebull must check if received ACKNAK corrupted bull twice as many states

ndash state must ldquorememberrdquo whether ldquocurrentrdquo pkt has 0 or 1 seq

Receiverbull must check if received packet is duplicate

ndash state indicates whether 0 or 1 is expected pkt seq

bull note receiver can not know if its last ACKNAK received OK at sender

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 32: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt22 a NAK-free Protocol

bull Reduce type of response ACK onlybull Same functionality as rdt21 using ACKs onlybull Instead of NAK receiver sends ACK for last

pkt received OKndash receiver must explicitly include seq of pkt being

ACKed

bull Duplicate ACK at sender results in same action as NAK retransmit current pkt

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 33: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

rdt_rcv(rcvpkt) ampamp (corrupt(rcvpkt) || has_seq1(rcvpkt))

udt_send(sndpkt)

Rdt22 Sender amp Receiver Fragments

Wait for call 0 from

above

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)

rdt_send(data)

udt_send(sndpkt)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) || isACK(rcvpkt1) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

Wait for ACK 0

sender FSMfragment

Wait for 0 from below

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp has_seq1(rcvpkt)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(ACK1 chksum)udt_send(sndpkt)

receiver FSMfragment

L

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 34: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

How to determine if a packet is lost

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 35: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt30 Channels with Errors and Loss

New assumption underlying channel can also lose packets (data or ACKs)ndash checksum seq ACKs retransmissions will be of

help but not enough

Approach sender waits ldquoreasonablerdquo amount of time for ACK

bull Retransmits if no ACK received in this time

bull If pkt (or ACK) just delayed (not lost)ndash Retransmission will be

duplicate but use of seq rsquos already handles this

ndash Receiver must specify seq of pkt being ACKed

bull Requires countdown timer

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 36: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt30 Sender

sndpkt = make_pkt(0 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

Wait for

ACK0

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt1) )

Wait for call 1 from

above

sndpkt = make_pkt(1 data checksum)udt_send(sndpkt)start_timer

rdt_send(data)

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt0)

rdt_rcv(rcvpkt) ampamp ( corrupt(rcvpkt) ||isACK(rcvpkt0) )

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt) ampamp isACK(rcvpkt1)

stop_timerstop_timer

udt_send(sndpkt)start_timer

timeout

udt_send(sndpkt)start_timer

timeout

rdt_rcv(rcvpkt)

Wait for call 0from

above

Wait for

ACK1

Lrdt_rcv(rcvpkt)

LL

L

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 37: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt30 in Action

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 38: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt30 in Action

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 39: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Performance of Rdt30

bull Rdt30 works but performance stinkshellipbull ex 1 Gbps link 15 ms prop delay 8000 bit

packet

U sender utilization ndash fraction of time sender busy sending

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

bull 1KB pkt every 30 msec -gt 33kBsec throughput over 1 Gbps link

bull Network protocol limits use of physical resources

dsmicrosecon8bps10

bits80009

R

Ldtrans

(Picture next slide)

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 40: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Rdt30 Stop-and-Wait Operation

first packet bit transmitted t = 0

sender

receiver

RTT

last packet bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

U sender

= 008

30008 = 000027

microseconds

L R

RTT + L R =

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 41: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Pipelined Protocols

Pipelining sender allows multiple ldquoin-flightrdquo yet-to-be-acknowledged pktsndash Range of sequence numbers must be increasedndash Need buffering at sender andor receiver

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 42: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Pipelining Increased Utilization

first packet bit transmitted t = 0

sender receiver

RTT

last bit transmitted t = L R

first packet bit arriveslast packet bit arrives send ACK

ACK arrives send next packet t = RTT + L R

last bit of 2nd packet arrives send ACKlast bit of 3rd packet arrives send ACK

U sender

= 024

30008 = 00008

microseconds

3 L R

RTT + L R =

Increase utilizationby a factor of 3

bull Two generic forms of pipelined protocols go-Back-N selective repeat

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 43: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Pipelining Protocols

Go-back-N overviewbull sender up to N

unACKed pkts in pipeline

bull receiver only sends cumulative ACKsndash doesnrsquot ACK pkt if

therersquos a gap

bull sender has timer for oldest unACKed pktndash if timer expires

retransmit all unACKed packets

Selective Repeat overviewbull sender up to N

unACKed packets in pipeline

bull receiver ACKs individual pkts

bull sender maintains timer for each unACKed pktndash if timer expires

retransmit only unACKed packet

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 44: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Go-Back-NSenderbull k-bit seq in pkt headerbull ldquowindowrdquo of up to N consecutive unACKed pkts allowed

bull ACK(n) ACKs all pkts up to including seq n - ldquocumulative ACKrdquo may receive duplicate ACKs (see receiver)

bull Timer for each in-flight pktbull Timeout(n) retransmit pkt n and all higher seq pkts in

window

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 45: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

GBN Sender Extended FSM

Waitstart_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1])hellipudt_send(sndpkt[nextseqnum-1])

timeout

rdt_send(data)

if (nextseqnum lt base+N) sndpkt[nextseqnum] = make_pkt(nextseqnumdatachksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ else refuse_data(data)

base = getacknum(rcvpkt)+1If (base == nextseqnum) stop_timer else start_timer

rdt_rcv(rcvpkt) ampamp notcorrupt(rcvpkt)

base=1nextseqnum=1

rdt_rcv(rcvpkt) ampamp corrupt(rcvpkt)

L

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 46: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

GBN Receiver Extended FSM

ACK-only always send ACK for correctly-received pkt with highest in-order seq ndash may generate duplicate ACKsndash need only remember expectedseqnum

bull out-of-order pkt ndash discard (donrsquot buffer) -gt no receiver bufferingndash Re-ACK pkt with highest in-order seq

Wait

udt_send(sndpkt)

default

rdt_rcv(rcvpkt) ampamp notcurrupt(rcvpkt) ampamp hasseqnum(rcvpktexpectedseqnum)

extract(rcvpktdata)deliver_data(data)sndpkt = make_pkt(expectedseqnumACKchksum)udt_send(sndpkt)expectedseqnum++

expectedseqnum=1sndpkt = make_pkt(expectedseqnumACKchksum)

L

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 47: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

GBN inaction

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 48: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

GBN Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsgo-back-nindexhtml

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 49: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Selective Repeatbull Receiver individually acknowledges all

correctly received pktsndash Buffers pkts as needed for eventual in-order

delivery to upper layer

bull Sender only resends pkts for which ACK not receivedndash Sender timer for each unACKed pkt

bull Sender windowndash N consecutive seq rsquosndash Again limits seq s of sent unACKed pkts

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 50: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Selective Repeat sender receiver windows

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 51: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Selective Repeat

data from above bull if next available seq in window send pkt

timeout(n)bull resend pkt n restart timer

ACK(n) in [sendbasesendbase+N]

bull mark pkt n as receivedbull if n smallest unACKed pkt advance window base to next

unACKed seq

senderpkt n in [rcvbase rcvbase+N-

1]bull send ACK(n)bull out-of-order bufferbull in-order deliver (also

deliver buffered in-order pkts) advance window to next not-yet-received pkt

pkt n in [rcvbase-Nrcvbase-1]

bull ACK(n)

otherwise bull ignore

receiver

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 52: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Transport Layer 3-52

Selective Repeat in Action

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 53: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Selective Repeat

DilemmaExample bull seq rsquos 0 1 2 3bull window size=3

bull receiver sees no difference in two scenarios

bull incorrectly passes duplicate data as new in (a)

Q what relationship between seq size and window size

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 54: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

SR Applet

httpmediapearsoncmgcomawaw_kurose_network_4appletsSRindexhtml

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 55: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 56: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Overview RFCs 793 1122 1323 2018 2581

bull full duplex datandash bi-directional data flow

in same connectionndash MSS maximum

segment size

bull connection-oriented ndash handshaking

(exchange of control msgs) initrsquos sender receiver state before data exchange

bull flow controlledndash sender will not

overwhelm receiver

bull point-to-pointndash one sender one

receiver

bull reliable in-order byte steamndash no ldquomessage

boundariesrdquo

bull pipelinedndash TCP congestion and

flow control set window size

bull send amp receive buffers

socketdoor

T C Psend buffer

T C Preceive buffer

socketdoor

segm ent

applicationwrites data

applicationreads data

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 57: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Segment Structure

source port dest port

32 bits

applicationdata (variable length)

sequence number

acknowledgement number

Receive window

Urg data pointerchecksumFSRPAU

headlen

notused

Options (variable length)

URG urgent data (generally not used)

ACK ACK valid

PSH push data now(generally not used)

RST SYN FINconnection estab(setup teardown

commands)

bytes rcvr willingto accept

countingby bytes of data(not segments)

Internetchecksum

(as in UDP)

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 58: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Seq rsquos and ACKsSeq rsquos

ndash byte stream ldquonumberrdquo of first byte in segmentrsquos data

ACKsndash seq of next

byte expected from other side

ndash cumulative ACKQ how receiver

handles out-of-order segmentsndash A TCP spec

doesnrsquot say up to implementer

Host A Host B

Seq=42 ACK=79 data = lsquoCrsquo

Seq=79 ACK=43 data = lsquoCrsquo

Seq=43 ACK=80

UsertypeslsquoCrsquo

host ACKsreceipt of echoedlsquoCrsquo

timesimple telnet scenario

host ACKsreceipt oflsquoCrsquo echoesback lsquoCrsquo

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 59: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 60: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTT

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 61: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Round Trip Time and Timeout

Q how to set TCP timeout value

bull Longer than RTTndash but RTT varies

bull Too short premature timeoutndash unnecessary

retransmissionsbull Too long slow

reaction to segment loss

Q how to estimate RTTbull SampleRTT measured time

from segment transmission until ACK receiptndash ignore retransmissions

bull SampleRTT will vary want estimated RTT ldquosmootherrdquondash average several recent

measurements not just current SampleRTT

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 62: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )EstimatedRTT + SampleRTT

bull Exponential weighted moving averagebull influence of past sample decreases exponentially

fastbull typical value = 18th (or 0125)

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 63: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Example Round Trip Time EstimationRTT gaiacsumassedu to fantasiaeurecomfr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 64: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Round Trip Time and Timeout

Setting the timeoutbull EstimtedRTT plus ldquosafety marginrdquo

ndash large variation in EstimatedRTT -gt larger safety margin

bull First estimate of how much SampleRTT deviates from EstimatedRTT

TimeoutInterval = EstimatedRTT + 4DevRTT

DevRTT = (1-)DevRTT + |SampleRTT-EstimatedRTT|

(typically = 025)

Then set timeout interval

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 65: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 66: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP reliable data transfer

bull TCP creates rdt service on top of IPrsquos unreliable service

bull Pipelined segmentsbull Cumulative ACKsbull TCP uses single retransmission timer

bull Retransmissions are triggered byndash Timeout eventsndash Duplicate ACKs

bull Initially consider simplified TCP senderndash Ignore duplicate ACKsndash Ignore flow control congestion control

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 67: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Sender EventsData rcvd from appbull Create segment with

seq bull seq is byte-stream

number of first data byte in segment

bull Start timer if not already running (think of timer as for oldest unACKed segment)

bull Expiration interval TimeOutInterval

Timeoutbull retransmit segment

that caused timeoutbull restart timer ACK rcvdbull If acknowledges

previously unACKed segmentsndash update what is known

to be ACKedndash start timer if there are

outstanding segments

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 68: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Sender

(simplified)

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) switch(event)

event data received from application above create TCP segment wseq NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event timer timeout retransmit not-yet-acked segment with smallest sequence number start timer

event ACK received with ACK field value of y if (y gt SendBase) SendBase = y if (there are not-yet-acked segments) start timer

end of loop forever

Commentbull SendBase-1 last cumulatively ACKed byteExamplebull SendBase-1 = 71

y= 73 so the rcvrwants 73+ y gt SendBase sothat new data is ACKed

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 69: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Retransmission Scenarios

Host A

Seq=100 20 bytes data

ACK=100

timepremature timeout

Host B

Seq=92 8 bytes data

ACK=120

Seq=92 8 bytes data

Seq

=92

tim

eout

ACK=120

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

lost ACK scenario

Host B

X

Seq=92 8 bytes data

ACK=100

time

Seq

=92

tim

eout

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 70: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Retransmission Scenarios (more)

Host ASeq=92 8 bytes data

ACK=100

losstim

eout

Cumulative ACK scenario

Host B

X

Seq=100 20 bytes data

ACK=120

time

SendBase= 120

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 71: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP ACK generation [RFC 1122 RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq All data up toexpected seq already ACKed

Arrival of in-order segment withexpected seq One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK Wait up to 500msfor next segment If no next segmentsend ACK

Immediately send single cumulative ACK ACKing both in-order segments

Immediately send duplicate ACK indicating seq of next expected byte

Immediate send ACK provided thatsegment starts at lower end of gap

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 72: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Fast Retransmit

bull Time-out period often relatively longndash Long delay before

resending lost packet

bull Detect lost segments via duplicate ACKsndash Sender often sends

many segments back-to-back

ndash If segment lost there will likely be many duplicate ACKs for that segment

bull If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lostndash fast retransmit resend

segment before timer expires

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 73: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Host A

timeo

ut

Host B

time

X

resend seq X2

seq x1seq x2seq x3seq x4seq x5

ACK x1

ACK x1ACK x1ACK x1

tripleduplicate

ACKs

Fast

Retra

nsm

it

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 74: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 75: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Flow Control

bull Receive side of TCP connection has a receive buffer

bull speed-matching service matching send rate to receiving applicationrsquos drain rate

bull App process may be slow at reading from buffer

sender wonrsquot overflowreceiverrsquos buffer bytransmitting too much too fast

flow control

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 76: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Flow Control How it Works

(suppose TCP receiver discards out-of-order segments)

bull unused buffer space= rwnd= RcvBuffer-[LastByteRcvd -

LastByteRead]

bull Receiver advertises unused buffer space by including rwnd value in segment header

bull sender limits of unACKed bytes to rwndndash guarantees receiverrsquos

buffer doesnrsquot overflow

IPdatagrams

TCP data(in buffer)

(currently)unused bufferspace

applicationprocess

rwndRcvBuffer

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 77: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 78: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Connection Management

Recall TCP sender receiver establish ldquoconnectionrdquo before exchanging data segments

bull initialize TCP variablesndash seq sndash buffers flow control

info (eg RcvWindow)bull client connection

initiator Socket clientSocket = new

Socket(ldquohostnamerdquo port) bull server contacted by

client Socket connectionSocket =

welcomeSocketaccept()

Three way handshake

Step 1 client host sends TCP SYN segment to serverndash specifies initial seq ndash no data

Step 2 server host receives SYN replies with SYNACK segmentndash server allocates buffersndash specifies server initial

seq Step 3 client receives

SYNACK replies with ACK segment which may contain data

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 79: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Connection Management (cont)

Closing a connection

client closes socket clientSocketclose()

Step 1 client end system sends TCP FIN control segment to server

Step 2 server receives FIN replies with ACK Closes connection sends FIN

client

FIN

server

ACK

ACK

FIN

close

close

closed

tim

ed w

ait

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 80: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Connection Management (cont)

Step 3 client receives FIN replies with ACK

ndash Enters ldquotimed waitrdquo - will respond with ACK to received FINs

Step 4 server receives ACK Connection closed

client

FIN

server

ACK

ACK

FIN

closing

closing

closed

tim

ed w

ait

closed

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 81: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Transport Layer 3-81

TCP Connection Management (cont)

TCP clientlifecycle

TCP serverlifecycle

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 82: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 83: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Principles of Congestion Control

Congestionbull Informally ldquotoo many sources sending too

much data too fast for network to handlerdquobull Different from flow controlbull Manifestations

ndash Lost packets (buffer overflow at routers)ndash Long delays (queueing in router buffers)

bull A ldquotop-10rdquo problem

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 84: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Causescosts of Congestion Scenario 1

bull Two senders two receivers

bull One router infinite buffers

bull No retransmission

bull Large delays when congested

bull Maximum achievable throughput

unlimited shared output link buffers

Host Alin original data

Host B

lout

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 85: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Causescosts of Congestion Scenario 2

bull One router finite buffers bull Sender retransmission of lost packet

finite shared output link

buffers

Host A

lin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 86: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Causescosts of congestion Scenario 2

bull Always (goodput)

bull ldquoPerfectrdquo retransmission only when loss

bull Retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

lin

lout

=

lin

lout

gtl

inl

out

ldquoCostsrdquo of congestion bull More work (retrans) for given ldquogoodputrdquobull unneeded retransmissions link carries multiple copies of

pkt

R2

R2lin

l ou

t

b

R2

R2lin

l ou

t

a

R2

R2lin

l ou

t

c

R4

R3

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 87: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Causescosts of Congestion Scenario 3

bull Four sendersbull Multihop pathsbull Timeoutretransmit

lin

Q what happens as and increase l

in

finite shared output link buffers

Host Alin original data

Host B

lout

lin original data plus retransmitted data

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 88: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Causescosts of Congestion Scenario 3

Another ldquocostrdquo of congestion bull When packet dropped any ldquoupstream

transmission capacity used for that packet was wasted

Host A

Host B

lo

u

t

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 89: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Approaches towards congestion control

End-end congestion control

bull No explicit feedback from network

bull Congestion inferred from end-system observed loss delay

bull Approach taken by TCP

Network-assisted congestion control

bull Routers provide feedback to end systemsndash Single bit indicating

congestion (SNA DECbit TCPIP ECN ATM)

ndash Explicit rate sender should send at

Broadly

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 90: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Chapter 3 outline

bull 31 Transport-layer servicesbull 32 Multiplexing and demultiplexingbull 33 Connectionless transport UDPbull 34 Principles of reliable data transfer

bull 35 Connection-oriented transport TCPndash segment structurendash reliable data transferndash flow controlndash connection

management

bull 36 Principles of congestion control

bull 37 TCP congestion control

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 91: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Congestion Control

bull Goal TCP sender should transmit as fast as possible but without congesting network- Q how to find rate just below congestion level

bull Decentralized each TCP sender sets its own rate based on implicit feedback - ACK segment received (a good thing)

network not congested so increase sending rate

- lost segment assume loss due to congested network so decrease sending rate

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 92: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Congestion Control Bandwidth Probing

bull ldquoProbing for bandwidthrdquo increase transmission rate on receipt of ACK until eventually loss occurs then decrease transmission rate - continue to increase on ACK decrease on loss (since

available bandwidth is changing depending on other connections in network)

ACKs being received so increase rate

X

X

XX

X loss so decrease rate

send

ing

rate

time

bull Q how fast to increasedecrease- details to follow

TCPrsquosldquosawtoothrdquobehavior

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 93: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Congestion Control detailsbull sender limits rate by limiting

number of unACKed bytes ldquoin pipelinerdquo

ndash cwnd differs from rwnd (how why)ndash sender limited by min(cwndrwnd)

bull roughly

bull cwnd is dynamic function of perceived network congestion

rate = cwnd

RTT bytessec

LastByteSent-LastByteAcked cwnd

cwndbytes

RTT

ACK(s)

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 94: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Congestion Control more details

segment loss event reducing cwnd

bull timeout no response from receiverndash cut cwnd to 1

bull 3 duplicate ACKs at least some segments getting through (recall fast retransmit)ndash cut cwnd in half less

aggressively than on timeout

ACK received increase cwnd

bull slowstart phase - increase exponentially

fast (despite name) at connection start or following timeout

bull congestion avoidance - increase linearly

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 95: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Slow Start

bull when connection begins cwnd = 1 MSSndash example MSS = 500

bytes amp RTT = 200 msecndash initial rate = 20 kbps

bull available bandwidth may be gtgt MSSRTTndash desirable to quickly ramp

up to respectable ratebull increase rate exponentially

until first loss event or when threshold reachedndash double cwnd every RTTndash done by incrementing cwnd by 1 for every ACK received

Host Aone segment

RT

T

Host B

time

two segments

four segments

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 96: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Transitioning intoout of slowstart

ssthresh cwnd threshold maintained by TCPbull on loss event set ssthresh to cwnd2

ndash remember (half of) TCP rate when congestion last occurred bull when cwnd gt= ssthresh transition from slowstart to

congestion avoidance phase

slow start timeout

ssthresh = cwnd2cwnd = 1 MSSdupACKcount = 0retransmit missing segment

timeoutssthresh = cwnd2 cwnd = 1 MSSdupACKcount = 0retransmit missing segment

Lcwnd gt ssthresh

cwnd = cwnd+MSSdupACKcount = 0transmit new segment(s)as allowed

new ACKdupACKcount++

duplicate ACK

Lcwnd = 1 MSSssthresh = 64 KBdupACKcount = 0 congestion

avoidance

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 97: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Congestion Avoidance

bull When cwnd gt ssthresh grow cwnd linearlyndash increase cwnd by 1

MSS per RTT ndash approach possible

congestion slower than in slowstart

ndash implementation cwnd = cwnd + MSScwnd for each ACK received

bull ACKs increase cwnd by 1 MSS per RTT additive increase

bull loss cut cwnd in half (non-timeout-detected loss ) multiplicative decrease

AIMD

AIMD Additive IncreaseMultiplicative Decrease

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 98: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Congestion Control FSM overview

slow start

congestionavoidance

fastrecovery

cwnd gt ssthresh

losstimeout

losstimeout

new ACK loss3dupACK

loss3dupACK

losstimeout

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 99: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Popular ldquoflavorsrdquo of TCP

ssthresh

ssthresh

TCP Tahoe

TCP Reno

Transmission round

cwnd w

ind

ow

siz

e

(in

segm

en

ts)

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 100: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Summary TCP Congestion Control

bull when cwnd lt ssthresh sender in slow-start phase window grows exponentially

bull when cwnd gt= ssthresh sender is in congestion-avoidance phase window grows linearly

bull when triple duplicate ACK occurs ssthresh set to cwnd2 cwnd set to ~ ssthresh

bull when timeout occurs ssthresh set to cwnd2 cwnd set to 1 MSS

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 101: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP throughput

bull Q whatrsquos average throughout of TCP as function of window size RTTndash ignoring slow start

bull Let W be window size when loss occursndash when window is W throughput is

WRTTndash just after loss window drops to W2

throughput to W2RTT ndash average throughout 75 WRTT

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 102: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

TCP Futures TCP over ldquolong fat pipesrdquo

bull Example 1500 byte segments 100ms RTT want 10 Gbps throughput

bull Requires window size W = 83333 in-flight segments

bull throughput in terms of loss rate

bull L = 210-10 Wowbull New versions of TCP for high-speed

LRTT

MSS221

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 103: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

fairness goal if K TCP sessions share same bottleneck link of bandwidth R each should have average rate of RK

TCP connection 1

bottleneckroutercapacity R

TCP connection 2

TCP Fairness

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 104: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Why is TCP fair

Two competing sessionsbull Additive increase gives slope of 1 as throughout increasesbull multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughput

Con

nect

i on

2 th

r ou g

h pu t

congestion avoidance additive increaseloss decrease window by factor of 2

congestion avoidance additive increaseloss decrease window by factor of 2

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 105: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Fairness (more)

Fairness and UDPbull Multimedia apps

often do not use TCPndash do not want rate

throttled by congestion control

bull Instead use UDPndash pump audiovideo at

constant rate tolerate packet loss

Fairness and Parallel TCP Connections

bull Nothing prevents app from opening parallel connections between 2 hosts

bull Web browsers do this bull Example link of rate R

supporting 9 connections ndash new app asks for 1 TCP

gets rate R10ndash new app asks for 11 TCPs

gets R2

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary
Page 106: Transport Layer CS 3516 – Computer Networks. Chapter 3: Transport Layer Goals: Understand principles behind transport layer services: –Multiplexing

Chapter 3 Summarybull Principles behind

transport layer servicesndash multiplexing

demultiplexingndash reliable data

transferndash flow controlndash congestion control

bull Instantiation and implementation in the Internetndash UDPndash TCP

Nextbull leaving the

network ldquoedgerdquo (application transport layers)

bull into the network ldquocorerdquo

  • Transport Layer CS 3516 ndash Computer Networks
  • Chapter 3 Transport Layer
  • Chapter 3 outline
  • Transport Services and Protocols
  • Transport vs Network layer
  • Internet Transport-layer Protocols
  • Chapter 3 outline (2)
  • Chapter 3 outline (3)
  • UDP User Datagram Protocol [RFC 768]
  • UDP more
  • UDP checksum
  • Internet Checksum Example
  • Chapter 3 outline (4)
  • Principles of Reliable data transfer
  • Principles of Reliable data transfer (2)
  • Principles of Reliable Data Transfer
  • Reliable Data Transfer Getting Started
  • Reliable Data Transfer Getting Started (2)
  • Rdt10 Reliable Transfer over a Reliable Channel
  • What if Taking a Message over Phone
  • What if Taking a Message over Phone (2)
  • Rdt20 Channel with Bit Errors (no Loss)
  • Rdt20 FSM Specification
  • Rdt20 Operation with No Errors
  • Rdt20 Error Scenario
  • Rdt20 Has a Fatal Flaw
  • Rdt20 Has a Fatal Flaw (2)
  • Rdt20 Has a Fatal Flaw (3)
  • Rdt21 Sender Handles Garbled ACKNAKs
  • Rdt21 Receiver Handles Garbled ACKNAKs
  • Rdt21 Discussion
  • Rdt22 a NAK-free Protocol
  • Rdt22 Sender amp Receiver Fragments
  • Rdt30 Channels with Errors and Loss
  • Rdt30 Channels with Errors and Loss (2)
  • Rdt30 Sender
  • Rdt30 in Action
  • Rdt30 in Action (2)
  • Performance of Rdt30
  • Rdt30 Stop-and-Wait Operation
  • Pipelined Protocols
  • Pipelining Increased Utilization
  • Pipelining Protocols
  • Go-Back-N
  • GBN Sender Extended FSM
  • GBN Receiver Extended FSM
  • GBN in action
  • GBN Applet
  • Selective Repeat
  • Selective Repeat sender receiver windows
  • Selective Repeat (2)
  • Selective Repeat in Action
  • Selective Repeat Dilemma
  • SR Applet
  • Chapter 3 outline (5)
  • TCP Overview RFCs 793 1122 1323 2018 2581
  • TCP Segment Structure
  • TCP Seq rsquos and ACKs
  • TCP Round Trip Time and Timeout
  • TCP Round Trip Time and Timeout (2)
  • TCP Round Trip Time and Timeout (3)
  • TCP Round Trip Time and Timeout (4)
  • Example Round Trip Time Estimation
  • TCP Round Trip Time and Timeout (5)
  • Chapter 3 outline (6)
  • TCP reliable data transfer
  • TCP Sender Events
  • TCP Sender (simplified)
  • TCP Retransmission Scenarios
  • TCP Retransmission Scenarios (more)
  • TCP ACK generation [RFC 1122 RFC 2581]
  • Fast Retransmit
  • Fast Retransmit (2)
  • Chapter 3 outline (7)
  • TCP Flow Control
  • TCP Flow Control How it Works
  • Chapter 3 outline (8)
  • TCP Connection Management
  • TCP Connection Management (cont)
  • TCP Connection Management (cont) (2)
  • TCP Connection Management (cont) (3)
  • Chapter 3 outline (9)
  • Principles of Congestion Control
  • Causescosts of Congestion Scenario 1
  • Causescosts of Congestion Scenario 2
  • Causescosts of congestion Scenario 2
  • Causescosts of Congestion Scenario 3
  • Causescosts of Congestion Scenario 3 (2)
  • Approaches towards congestion control
  • Chapter 3 outline (10)
  • TCP Congestion Control
  • TCP Congestion Control Bandwidth Probing
  • TCP Congestion Control details
  • TCP Congestion Control more details
  • TCP Slow Start
  • Transitioning intoout of slowstart
  • TCP Congestion Avoidance
  • TCP Congestion Control FSM overview
  • Popular ldquoflavorsrdquo of TCP
  • Summary TCP Congestion Control
  • TCP throughput
  • TCP Futures TCP over ldquolong fat pipesrdquo
  • TCP Fairness
  • Why is TCP fair
  • Fairness (more)
  • Chapter 3 Summary