1 outline principles of congestion control tcp congestion control
Post on 19-Dec-2015
249 views
TRANSCRIPT
2
Principles of Congestion Control
Congestion:
• informally: “too many sources sending too much data too fast for network to handle”
• different from flow control!
• manifestations:
– lost packets (buffer overflow at routers)
– long delays (queueing in router buffers)
• a top-10 problem!
3
Approaches towards congestion control
End-end congestion control:
• no explicit feedback from network
• congestion inferred from end-system observed loss, delay
• approach taken by TCP
Network-assisted congestion control:
• routers provide feedback to end systems
– single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM)
– explicit rate sender should send at
Two broad approaches towards congestion control:
4
TCP Congestion Control
• end-end control (no network assistance)
• sender limits transmission:
LastByteSent-LastByteAcked
CongWin
• Roughly,
• CongWin is dynamic, function of perceived network congestion
How does sender perceive congestion?
• loss event = timeout or 3 duplicate acks
• TCP sender reduces rate (CongWin) after loss event
three mechanisms:
– AIMD
– slow start
– conservative after timeout events
rate = CongWin
RTT Bytes/sec
5
TCP AIMD
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
multiplicative decrease: cut CongWin in half after loss event
additive increase: increase CongWin by 1 MSS every RTT in the absence of loss events: probing
Long-lived TCP connection
6
TCP Slow Start
• When connection begins, CongWin = 1 MSS
– Example: MSS = 500 bytes & RTT = 200 msec
– initial rate = 20 kbps
• available bandwidth may be >> MSS/RTT
– desirable to quickly ramp up to respectable rate
When connection begins, increase rate exponentially fast until first loss event
7
TCP Slow Start (more)
• When connection begins, increase rate exponentially until first loss event:
– double CongWin every RTT
– done by incrementing CongWin for every ACK received
• Summary: initial rate is slow but ramps up exponentially fast
Host A
one segment
RT
T
Host B
time
two segments
four segments
8
Refinement• After 3 dup ACKs:
– CongWin is cut in half
– window then grows linearly
• But after timeout event:
– CongWin instead set to 1 MSS;
– window then grows exponentially
– to a threshold, then grows linearly
• 3 dup ACKs indicates network capable of delivering some segments• timeout before 3 dup ACKs is “more alarming”
Philosophy:
9
Refinement (more)Q: When should the
exponential increase switch to linear?
A: When CongWin gets to 1/2 of its value before timeout.
Implementation:
• Variable Threshold
• At loss event, Threshold is set to 1/2 of CongWin just before loss event
10
Summary: TCP Congestion Control
• When CongWin is below Threshold, sender in slow-start phase, window grows exponentially.
• When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly.
• When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold.
• When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS.
11
TCP sender congestion control
Event State TCP Sender Action Commentary
ACK receipt for previously unacked data
Slow Start (SS)
CongWin = CongWin + MSS, If (CongWin > Threshold) set state to “Congestion Avoidance”
Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data
CongestionAvoidance (CA)
CongWin = CongWin+MSS * (MSS/CongWin)
Additive increase, resulting in increase of CongWin by 1 MSS every RTT
Loss event detected by triple duplicate ACK
SS or CA Threshold = CongWin/2, CongWin = Threshold,Set state to “Congestion Avoidance”
Fast recovery, implementing multiplicative decrease. CongWin will not drop below 1 MSS.
Timeout SS or CA Threshold = CongWin/2, CongWin = 1 MSS,Set state to “Slow Start”
Enter slow start
Duplicate ACK
SS or CA Increment duplicate ACK count for segment being acked
CongWin and Threshold not changed
12
TCP throughput
• What’s the average throughout ot TCP as a function of window size and RTT?
– Ignore slow start
• Let W be the window size when loss occurs.
• When window is W, throughput is W/RTT
• Just after loss, window drops to W/2, throughput to W/2RTT.
• Average throughout: .75 W/RTT
13
Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K
TCP connection 1
bottleneckroutercapacity R
TCP connection 2
TCP Fairness
14
Why is TCP fair?Two competing sessions:
• Additive increase gives slope of 1, as throughout increases
• multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
Con
nect
ion
2 th
roug
h pu t
congestion avoidance: additive increaseloss: decrease window by factor of 2
congestion avoidance: additive increaseloss: decrease window by factor of 2
15
Fairness (more)
Fairness and UDP
• Multimedia apps often do not use TCP
– do not want rate throttled by congestion control
• Instead use UDP:
– pump audio/video at constant rate, tolerate packet loss
• Research area: TCP friendly
Fairness and parallel TCP connections
• nothing prevents app from opening parallel cnctions between 2 hosts.
• Web browsers do this
• Example: link of rate R supporting 9 cnctions;
– new app asks for 1 TCP, gets rate R/10
– new app asks for 11 TCPs, gets R/2 !
Midterm Review
In class, 3:30-5:00 pm, Mon. 2/9
Closed Book
One 8.5” by 11” sheet of paper permitted (single side)
Lecture 1
• Internet Architecture
• Network Protocols
• Network Edge
• A taxonomy of communication networks
• The fundamental question: how is data transferred through net (including edge & core)?
• Communication networks can be classified based on how the nodes exchange information:
A Taxonomy of Communication Networks
Communication Networks
SwitchedCommunication
Network
BroadcastCommunication
Network
Circuit-Switched
Communication Network
Packet-Switched
Communication Network
Datagram Network
Virtual Circuit Network
TDM FDM
Packet Switching: Statistical Multiplexing
Sequence of A & B packets does not have fixed pattern statistical multiplexing.
In TDM each host gets same slot in revolving TDM frame.
A
B
C10 MbsEthernet
1.5 Mbs
D E
statistical multiplexing
queue of packetswaiting for output
link
Packet Switching versus Circuit Switching
• Circuit Switching
– Network resources (e.g., bandwidth) divided into “pieces” for allocation
– Resource piece idle if not used by owning call (no sharing)
– NOT efficient !
• Packet Switching:
– Great for bursty data
– Excessive congestion: packet delay and loss
– protocols needed for reliable data transfer, congestion control
Datagram Packet Switching
• Each packet is independently switched
– Each packet header contains destination address which determines next hop
– Routes may change during session
• No resources are pre-allocated (reserved) in advance
• Example: IP networks
Virtual-Circuit Packet Switching
• Hybrid of circuit switching and packet switching
– All packets from one packet stream are sent along a pre-established path (= virtual circuit)
– Each packet carries tag (virtual circuit ID), tag determines next hop
• Guarantees in-sequence delivery of packets
• However, packets from different virtual circuits may be interleaved
Lecture 2
• Network access and physical media
• Internet structure and ISPs
• Delay & loss in packet-switched networks
• Protocol layers, service models
Internet structure: network of networks
• “Tier-3” ISPs and local ISPs
– last hop (“access”) network (closest to end systems)
– Tier-3: Turkish Telecom, Minnesota Regional Network
Tier 1 ISP
Tier 1 ISP
Tier 1 ISP
NAP
Tier-2 ISPTier-2 ISP
Tier-2 ISP Tier-2 ISP
Tier-2 ISP
localISPlocal
ISPlocalISP
localISP
localISP Tier 3
ISP
localISP
localISP
localISP
Local and tier- 3 ISPs are customers ofhigher tier ISPsconnecting them to rest of Internet
Four sources of packet delay
• 1. processing:
– check bit errors
– determine output link
A
B
propagation
transmission
processingqueueing
• 2. queueing
– time waiting at output link for transmission
– depends on congestion level of router
Delay in packet-switched networks
3. Transmission delay:
• R=link bandwidth (bps)
• L=packet length (bits)
• time to send bits into link = L/R
4. Propagation delay:
• d = length of physical link
• s = propagation speed in medium (~2x108 m/sec)
• propagation delay = d/s
A
B
propagation
transmission
processingqueueing
Note: s and R are very different quantities!
Internet protocol stack• application: supporting network
applications
– FTP, SMTP, STTP
• transport: host-host data transfer
– TCP, UDP
• network: routing of datagrams from source to destination
– IP, routing protocols
• link: data transfer between neighboring network elements
– PPP, Ethernet
• physical: bits “on the wire”
application
transport
network
link
physical
Application Layer• Principles of app layer protocols
• Web and HTTP
• FTP
• Electronic Mail: SMTP, POP3, IMAP
• DNS
• Socket Programming
• Web Caching
HTTP connections
Nonpersistent HTTP
• At most one object is sent over a TCP connection.
• HTTP/1.0 uses nonpersistent HTTP
Persistent HTTP
• Multiple objects can be sent over single TCP connection between client and server.
• HTTP/1.1 uses persistent connections in default mode
• HTTP Message, Format, Response, Methods• HTTP cookies
Response Time of HTTP
Nonpersistent HTTP issues:
• requires 2 RTTs per object
• OS must work and allocate host resources for each TCP connection
• but browsers often open parallel TCP connections to fetch referenced objects
Persistent HTTP
• server leaves connection open after sending response
• subsequent HTTP messages between same client/server are sent over connection
Persistent without pipelining:
• client issues new request only when previous response has been received
• one RTT for each referenced object
Persistent with pipelining:
• default in HTTP/1.1
• client sends requests as soon as it encounters a referenced object
• as little as one RTT for all the referenced objects
FTP: separate control, data connections
• FTP client contacts FTP server at port 21, specifying TCP as transport protocol
• Client obtains authorization over control connection
• Client browses remote directory by sending commands over control connection.
• When server receives a command for a file transfer, the server opens a TCP data connection to client
• After transferring one file, server closes connection.
FTPclient
FTPserver
TCP control connection
port 21
TCP data connectionport 20
• Server opens a second TCP data connection to transfer another file.
• Control connection: “out of band”
• FTP server maintains “state”: current directory, earlier authentication
Electronic Mail: SMTP [RFC 2821]• uses TCP to reliably transfer
email message from client to
server, port 25
• direct transfer: sending
server to receiving server
user mailbox
outgoing message queue
mailserver
useragent
useragent
useragent
mailserver
useragent
useragent
mailserver
useragent
SMTP
SMTP
DNS name servers
• no server has all name-to-IP address mappings
local name servers:
– each ISP, company has local (default) name server
– host DNS query first goes to local name server
authoritative name server:
– for a host: stores that host’s IP address, name
– can perform name/address translation for that host’s name
Why not centralize DNS?
• single point of failure
• traffic volume
• distant centralized database
• maintenance
doesn’t scale!
DNS example
Root name server:
• may not know authoritative name server
• may know intermediate name server: who to contact to find authoritative name server
requesting hostsurf.eurecom.fr
www.cs.nwu.edu
root name server
local name serverdns.eurecom.fr
1
23
4 5
6
authoritative name serverdns.cs.nwu.edu
intermediate name serverdns.nwu.edu
7
8
DNS: iterated queries
recursive query:
• puts burden of name resolution on contacted name server
• heavy load?
iterated query:
• contacted server replies with name of server to contact
• “I don’t know this name, but ask this server”
requesting hostsurf.eurecom.fr
gaia.cs.umass.edu
root name server
local name serverdns.eurecom.fr
1
23
4
5 6
authoritative name serverdns.cs.umass.edu
intermediate name serverdns.umass.edu
7
8
iterated query
Web caches (proxy server)
• user sets browser: Web accesses via cache
• browser sends all HTTP requests to cache
– object in cache: cache returns object
– else cache requests object from origin server, then returns object to client
• Why web caching?
Goal: satisfy client request without involving origin server
client
Proxyserver
client
HTTP request
HTTP request
HTTP response
HTTP response
HTTP request
HTTP response
origin server
origin server
Caching example (3)Install cache
• suppose hit rate is .4
Consequence
• 40% requests will be satisfied almost immediately
• 60% requests satisfied by origin server
• utilization of access link reduced to 60%, resulting in negligible delays (say 10 msec)
• total delay = Internet delay + access delay + LAN delay
= .6*2 sec + .6*.01 secs + milliseconds < 1.3 secs
originservers
public Internet
institutionalnetwork 10 Mbps LAN
1.5 Mbps access link
institutionalcache
Transport Layer
• Transport-layer services
• Multiplexing and demultiplexing
• Connectionless transport: UDP
• Principles of reliable data transfer
• TCP
– Segment structures
– Flow control
– Congestion control
Demultiplexing• UDP socket identified
by two-tuple:
(dest IP address, dest port number)
• When host receives UDP segment:
– checks destination port number in segment
– directs UDP segment to socket with that port number
• TCP socket identified by 4-tuple:
– source IP address
– source port number
– dest IP address
– dest port number
• recv host uses all four values to direct segment to appropriate socket
UDP: User Datagram Protocol [RFC 768]
Why is there a UDP?
• no connection establishment (which can add delay)
• simple: no connection state at sender, receiver
• small segment header
• no congestion control: UDP can blast away as fast as desired
source port # dest port #
32 bits
Applicationdata
(message)
UDP segment format
length checksum
UDP checksum
Sender:
• treat segment contents as sequence of 16-bit integers
• checksum: addition (1’s complement sum) of segment contents
• sender puts checksum value into UDP checksum field
Receiver:
• addition of all segment contents + checksum
• check if all bits are 1:
– NO - error detected
– YES - no error detected. But maybe errors nonetheless? More later ….
Goal: detect “errors” (e.g., flipped bits) in transmitted segment
0110010110110100
Addition:1’s complement sum:
0110010101001111
1’s complement sum:Addition:
Go-Back-NSender:
• k-bit seq # in pkt header
• “window” of up to N, consecutive unack’ed pkts allowed
• ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK”
– may deceive duplicate ACKs (see receiver)
• Single timer for all in-flight pkts
• timeout(n): retransmit pkt n and all higher seq # pkts in window
Selective Repeat
• receiver individually acknowledges all correctly received pkts
– buffers pkts, as needed, for eventual in-order delivery to upper layer
• sender only resends pkts for which ACK not received
– sender timer for each unACKed pkt
• sender window
– N consecutive seq #’s
– again limits seq #s of sent, unACKed pkts
TCP segment structure
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberReceive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG: urgent data (generally not used)
ACK: ACK #valid
PSH: push data now(generally not used)
RST, SYN, FIN:connection estab(setup, teardown
commands)
# bytes rcvr willingto accept
countingby bytes of data(not segments!)
Internetchecksum
(as in UDP)