1 school of computing science simon fraser university cmpt 771/471: internet architecture and...
Post on 31-Mar-2015
221 Views
Preview:
TRANSCRIPT
1
School of Computing Science
Simon Fraser University
CMPT 771/471: Internet Architecture and CMPT 771/471: Internet Architecture and ProtocolsProtocols
Transport LayerTransport Layer
Instructor: Dr. Mohamed HefeedaInstructor: Dr. Mohamed Hefeeda
2
Review of Basic Networking Concepts
Internet structure Protocol layering and encapsulation Internet services and socket programming Network Layer
Network types: Circuit switching, Packet switching Addressing, Forwarding, Routing
Transport layer Reliability, congestion and flow control TCP, UDP
Link Layer Multiple Access Protocols Ethernet
3
Transport services and protocols
provide logical communication between app processes running on different hosts
transport protocols run in end systems
send side: breaks app messages into segments, passes to network layer
rcv side: reassembles segments into messages, passes to app layer
more than one transport protocol available to apps
Internet: TCP and UDP
application
transportnetworkdata linkphysical
application
transportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysicalnetwork
data linkphysical
logical end-end transport
4
Transport vs. network layer
network layer: logical communication between hosts
transport layer: logical communication between processes
relies on, enhances, network layer services
Household analogy:
12 kids sending letters to 12 kids
processes = kids app messages = letters
in envelopes hosts = houses transport protocol = Ann
and Bill network-layer protocol =
postal service
5
Multiplexing/demultiplexing
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
host 1 host 2host 3
= process= socket
delivering received segmentsto correct socket
Demultiplexing at rcv host:gathering data from multiplesockets, enveloping data with header (later used for demultiplexing)
Multiplexing at send host:
6
Connectionless demux
ClientIP:B
P2
client IP: A
P1P1P3
serverIP: C
SP: 6428
DP: 9157
SP: 6428
DP: 5775
SP: 5775
DP: 6428SP: 9157
DP: 6428
UDP socket identified by: (dst IP, dst Port) datagrams with different src IPs and/or src ports are directed to same socket
7
Connection-oriented demux (cont)
ClientIP:B
P1
client IP: A
P1P2P4
serverIP: C
SP: 9157
DP: 80
SP: 9157
DP: 80
P5 P6 P3
D-IP:CS-IP: A
D-IP:C
S-IP: B
SP: 5775
DP: 80
D-IP:CS-IP: B
TCP socket identified by 4-tuple: (src IP, src Port, dst IP, dst Port)
8
UDP: User Datagram Protocol [RFC 768]
“no frills,” “bare bones” Internet transport protocol
“best effort” service, UDP segments may be:
lost delivered out of order
to app Connectionless:
no handshaking between UDP sender, receiver
each UDP segment handled independently of others
Why is there a UDP? no connection
establishment (which can add delay)
simple: no connection state at sender, receiver
small segment header no congestion control: UDP
can blast away as fast as desired
9
UDP
often used for streaming multimedia apps
loss tolerant rate sensitive
other UDP uses DNS SNMP
reliable transfer over UDP: add reliability at application layer
application-specific error recovery!
source port # dest port #
32 bits
Applicationdata
(message)
UDP segment format
length checksumLength, in
bytes of UDPsegment,including
header
10
Reliable data transfer
important in application, transport, and link layers top-10 list of important networking topics!
characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
11
Pipelined (Sliding Window) Protocols
Pipelining: sender allows multiple, “in-flight”, yet-to-be-acknowledged pkts
range of sequence numbers must be increased buffering at sender and/or receiver
Two generic forms of pipelined protocols: go-Back-N, selective repeat
12
Go-Back-N
Sender: k-bit seq # in pkt header “window” of up to N, consecutive unack’ed pkts allowed
ACK(n): ACKs all pkts up to, including seq # n -- cumulative ACK may receive duplicate ACKs (see receiver)
timer for each in-flight pkt timeout(n): retransmit pkt n and all higher seq # pkts in window
i.e., go back to n
13
GBN inaction
Window size, N = 4
Go back to 2
14
Go-Back-N
Do you see potential problems with GBN?
Consider high-speed links with long delays (called large bandwidth-delay product pipes)
GBN can fill that pipe by having large N many unACKed pkts could be in the pipe A single lost pkt could cause a re-transmission of a huge
number (up to N) of pkts waste of bandwidth
Solutions??
15
Selective Repeat
receiver individually acknowledges all correctly received pkts
buffers pkts, as needed, for eventual in-order delivery to upper layer
sender only resends pkts for which ACK not received sender timer for each unACKed pkt
sender window N consecutive seq #’s again limits seq #s of sent, unACKed pkts
16
Selective repeat: sender, receiver windows
17
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581
full duplex data: bi-directional data flow
in same connection MSS: maximum
segment size
connection-oriented: handshaking (exchange
of control msgs) init’s sender, receiver state before data exchange
flow controlled: sender will not
overwhelm receiver
point-to-point: one sender, one receiver
reliable, in-order byte stream:
no “message boundaries”
pipelined: TCP congestion and flow
control set window size
send & receive buffers
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm ent
applicationwrites data
applicationreads data
18
TCP segment structure
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberReceive window
Urg data pnterchecksum
FSRPAUheadlen
notused
Options (variable length)
URG: urgent data (generally not used)
ACK: ACK #valid
PSH: push data now(generally not used)
RST, SYN, FIN:connection estab(setup, teardown
commands)
# bytes rcvr willingto accept
countingby bytes of data(not segments!)
Internetchecksum
(as in UDP)
19
TCP reliable data transfer
TCP creates rdt service on top of IP’s unreliable service
Pipelined segments Cumulative acks TCP uses single
retransmission timer
Retransmissions are triggered by:
timeout events duplicate acks
Initially consider simplified TCP sender:
ignore duplicate acks ignore flow control,
congestion control
20
TCP sender events:
data rcvd from app: Create segment with seq
# seq # is byte-stream
number of first data byte in segment
start timer if not already running (think of timer as for oldest unacked segment)
expiration interval: TimeOutInterval
timeout: retransmit segment that
caused timeout restart timer
Ack rcvd: If acknowledges
previously unacked segments
update what is known to be acked
start timer if there are outstanding segments
21
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) { switch(event)
event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer }
} /* end of loop forever */
22
SendBase= 120
TCP: retransmission scenarios
Host A
Seq=100, 20 bytes data
ACK=100
time
premature timeout
Host B
Seq=92, 8 bytes data
ACK=120
Seq=92, 8 bytes data
Seq=
92
tim
eout
ACK=120
Host A
Seq=92, 8 bytes data
ACK=100
loss
tim
eout
lost ACK scenario
Host B
X
Seq=92, 8 bytes data
ACK=100
time
Seq=
92
tim
eout
SendBase= 100
SendBase= 120
Sendbase= 100
23
TCP retransmission scenarios (more)
Host A
Seq=92, 8 bytes data
ACK=100
loss
tim
eout
Cumulative ACK scenario
Host B
X
Seq=100, 20 bytes data
ACK=120
time
SendBase= 120
24
TCP Round Trip Time and Timeout
If TCP timeout is too short: premature timeout unnecessary
retransmissions too long: slow reaction to segment loss
Q: how to set TCP timeout value? Based on Round Trip Time (RTT), but RTT itself varies with
time! We need to estimate current RTT
RTT Estimation SampleRTT: measured time from segment transmission
until ACK receipt ignore retransmissions SampleRTT will vary, want estimated RTT “smoother” average several recent measurements, not just current SampleRTT
25
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially fast typical value: = 0.125
26
Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
27
TCP Round Trip Time and Timeout
Setting the timeout
EstimtedRTT plus safety margin large variation in EstimatedRTT -> larger safety margin
first estimate how much SampleRTT deviates from EstimatedRTT:
TimeoutInterval = EstimatedRTT + 4*DevRTT
DevRTT = (1-)*DevRTT + *|SampleRTT - EstimatedRTT|
(typically, = 0.25)
Then set timeout interval:
28
Fast Retransmit
Time-out period often relatively long:
long delay before resending lost packet
Detect lost segments via duplicate ACKs.
Sender often sends many segments back-to-back
If segment is lost, there will likely be many duplicate ACKs.
If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost:
fast retransmit: resend segment before timer expires
29
TCP Connection Management: opening
TCP: 3-way handshakeStep 1: client host sends TCP SYN
segment to server specifies initial seq # no data
Step 2: server host receives SYN, replies with SYNACK segment
server allocates buffers specifies server initial seq. #
Step 3: client receives SYNACK, replies with ACK segment, which may contain data
client
SYN=1, seq= client_isn
server
SYN=1, seq=server_isn,
ack=client_isn+1
SYN=0, seq=client_isn+1,
ack=server_isn+1
conn. request
conn. granted
Q. How would a hacker exploit TCP 3-way handshake to bring a server down?
A. SYN Flood DoS attack
30
TCP Connection Management: closing
Step 1: client end system sends TCP FIN segment to server
Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN
Step 3: client receives FIN, replies with ACK
Enters “timed wait” – may need to re-send ACK to received FINs
Step 4: server, receives ACK Connection closed
client
FIN
server
ACK
ACK
FIN
closing
closing
closedti
med w
ait
closed
31
TCP Connection Management
TCP clientlifecycle
TCP serverlifecycle
32
TCP Flow Control
receive side of TCP connection has a receive buffer:
speed-matching service: matching the send rate to the receiving app’s drain rate
app process may be slow at reading from buffer
sender won’t overflow
receiver’s buffer bytransmitting too
much, too fast
flow control
33
TCP Flow control: how it works
(Suppose TCP receiver discards out-of-order segments)
spare room in buffer= RcvWindow
= RcvBuffer-[LastByteRcvd - LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKed data to RcvWindow
guarantees receive buffer doesn’t overflow
34
Congestion Control
Congestion: sources send too much data for network to handle
different from flow control, which is e2e Congestion results in …
lost packets (buffer overflow at routers)• more work (retransmissions) for given “goodput”
long delays (queueing in router buffers)• Premature (unneeded) retransmissions
Waste of upstream links’ capacity • Pkt traversed several links, then dropped at
congested router
35
Approaches towards congestion control
End-end congestion control: no explicit feedback from
network congestion inferred from
end-system observed loss, delay
approach taken by TCP
Network-assisted congestion control:
routers provide feedback to end systems
single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM)
explicit rate sender should send at
Two broad approaches towards congestion control:
36
TCP congestion control: Approach
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Approach: probe for usable bandwidth in network increase transmission rate until loss occurs then
decrease Additive increase, multiplicative decrease (AIMD)
time
Rat
e (C
ongW
in)
Saw toothbehavior: probing
for bandwidth
37
TCP Congestion Control
Sender keeps a new variable, Congestion Window (CongWin), and limits unacked bytes to:
LastByteSent - LastByteAcked min {CongWin, RcvWin}
For our discussion: assume RcvWin is large enough
Roughly, what is the sending rate as a function of CongWin? Ignore loss and transmission delay
Rate = CongWin/RTT (bytes/sec)
So, rate and CongWin are somewhat synonymous
38
TCP Congestion Control
Congestion occurs at routers (inside the network) Routers do not provide any feedback to TCP
How can TCP infer congestion? From its symptoms: timeout or duplicate acks Define loss event ≡ timeout or 3 duplicate acks TCP decreases its CongWin (rate) after a loss event
TCP Congestion Control Algorithm: three components AIMD: additive increase, multiplicative decrease slow start Reaction to timeout events
39
AIMD
additive increase: (congestion avoidance phase) increase CongWin by 1 MSS every RTT until loss detected TCP increases CongWin by: MSS x (MSS/CongWin) for every
ACK received Ex. MSS = 1,460 bytes and CongWin = 14,600 bytes With every ACK, CongWin is increased by 146 bytes
multiplicative decrease: cut CongWin in half after loss
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Con
gWin
40
TCP Slow Start
When connection begins, CongWin = 1 MSS Example: MSS = 500 bytes & RTT = 200 msec initial rate = CongWin/RTT = 20 kbps
available bandwidth may be >> MSS/RTT desirable to quickly ramp up to respectable rate
Slow start: When connection begins, increase rate exponentially fast
until first loss event. How can we do that? double CongWin every RTT. How? Increment CongWin by 1 MSS for every ACK received
41
TCP Slow Start (cont’d)
Increment CongWin by 1 MSS for every ACK
Summary: initial rate is slow but ramps up exponentially fast
Host A
one segment
RTT
Host B
time
two segments
four segments
42
Reaction to a Loss event
TCP Tahoe (Old) Threshold = CongWin / 2 Set CongWin = 1 Slow start till threshold Then Additive Increase // congestion avoidance
TCP Reno (most current TCP implementations) If 3 dup acks // fast retransmit
• Threshold = CongWin / 2• Set CongWin = Threshold // fast recovery • Additive Increase
Else // timeout• Same as TCP Tahoe
43
Reaction to a Loss event (cont’d)
Why differentiate between 3 dup acks and timeout? 3 dup ACKs indicate network capable of
delivering some segments timeout indicates a “more alarming” congestion scenario
3 dup acks
44
TCP Congestion Control: Summary
Initially
Threshold is set to large value (65 Kbytes), has no effect
CongWin = 1 MSS
Slow Start (SS): CongWin grows exponentially
till a loss event occurs (timeout or 3 dup ack) or reaches Threshold
Congestion Avoidance (CA): CongWin grows linearly
3 duplicate ACK occurs:
Threshold = CongWin/2; CongWin = Threshold; CA
Timeout occurs:
Threshold = CongWin/2; CongWin = 1 MSS; SS till Threshold
45
TCP throughput
What’s the average throughout of TCP as a function of window size and RTT?
Ignore slow start
Let W be the window size when loss occurs When window is W, throughput is W/RTT Just after loss, window drops to W/2,
throughput to W/2RTT
Average throughout: 0.75 W/RTT
top related