tcp / ip-tcp/udp most udp servers are“iterative” => a single server process receives and...

16
1 TCP / IP-TCP/UDP Network Information Center BUPT Nov./2017 Agenda n TCP/IP protocol stack n Principles behind transport layer services ¡ Addressing ¡ Multiplexing n Transport layer protocols in the Internet ¡ UDP: connectionless transport ¡ TCP: connection-oriented transport ¡ MPTCP: Multiple TCP ¡ QUIC: Quick UDP Internet Connections TCP/IP Protocol Family Agenda n TCP/IP protocol stack n Principles behind transport layer services ¡ Addressing ¡ Multiplexing n Transport layer protocols in the Internet ¡ UDP: connectionless transport ¡ TCP: connection-oriented transport ¡ MPTCP: Multiple TCP ¡ QUIC: Quick UDP Internet Connections Overview of Transport Layer n Provide efficient, reliable and cost-effective service to its users (in application layer) n Transport layer protocols are end-to-end protocols ¡ Transport layer is only implemented at the hosts ¡ Shields upper layer protocols from the details of the network n Transport protocol can be either connection oriented or connectionless n Enhancing the QoS provided by network layer n Transport vs network layer services: ¡ Network layer: logical communication between hosts ¡ Transport layer: logical communication between processes n Relies on, enhances, network layer services Overview of Transport Layer (Continued) Node to node: Data link layer Host to host: Network layer Process to process: transport layer

Upload: duongduong

Post on 17-Apr-2018

219 views

Category:

Documents


3 download

TRANSCRIPT

1

TCP / IP-TCP/UDP

Network Information Center

BUPT

Nov./2017

Agenda

n TCP/IP protocol stack

n Principles behind transport layer services¡ Addressing

¡ Multiplexing

n Transport layer protocols in the Internet¡ UDP: connectionless transport

¡ TCP: connection-oriented transport

¡ MPTCP: Multiple TCP

¡ QUIC: Quick UDP Internet Connections

TCP/IP Protocol Family Agenda

n TCP/IP protocol stack

n Principles behind transport layer services¡ Addressing

¡ Multiplexing

n Transport layer protocols in the Internet¡ UDP: connectionless transport

¡ TCP: connection-oriented transport

¡ MPTCP: Multiple TCP

¡ QUIC: Quick UDP Internet Connections

Overview of Transport Layer

n Provide efficient, reliable and cost-effective service to its users (in application layer)

n Transport layer protocols are end-to-end protocols¡ Transport layer is only implemented at the hosts

¡ Shields upper layer protocols from the details of the network

n Transport protocol can be either connection oriented or connectionless

n Enhancing the QoS provided by network layer

n Transport vs network layer services:¡ Network layer: logical communication between hosts

¡ Transport layer: logical communication between processes

n Relies on, enhances, network layer services

Overview of Transport Layer (Continued)

Node to node: Data link layer

Host to host: Network layer

Process to process: transport layer

2

Overview of Transport Layer (Continued)

n Connection-oriented, reliable, in-order unicast delivery (TCP)¡ Congestion control

¡ Flow control

¡ Connection setup

n Connectionless, unreliable (“best-effort”) unordered unicast or multicast delivery(UDP)¡ No-frills extension of “best-effort” IP

Total over 150 RFCs related with TCP

RFC793, 1180, 4653, 5382, …

Overview of Transport Layer (Continued)

n Transport Entity: the hardware or/and software within the transport layer that provide the transport layer functions

n Transport Protocol: used between two transport entities

n TPDU: transport layer protocol data unit

Elements of Transport Protocol

n Elements of both connection-oriented and connectionless services:¡ Addressing

¡ Multiplexing

n Elements of connection-oriented services¡ Connection establishment

¡ Flow control

¡ Error Control

Agenda

n TCP/IP protocol stack

n Principles behind transport layer services¡ Addressing

¡ Multiplexing

n Transport layer protocols in the Internet¡ UDP: connectionless transport

¡ TCP: connection-oriented transport

¡ MPTCP: Multiple TCP

¡ QUIC: Quick UDP Internet Connections

Addressing

n Network layer has a single network addressn Transport layer may have multiple transport addresses

– used in multiplexing/demultiplexing

Addressing (Continued)

n Data link layer: MAC address

n Network layer: IP address

n Transport layer may have multiple transport addresses – used in multiplexing/demultiplexing

¡ IP address + port number

3

Addressing (Continued)

n TCP, UDP use port number to identify different applications

n Internet Assigned Numbers Authority (IANA) maintains a list of port number assignment (RFC 1700 and RFC3232) ¡ Well-known ports (0-1023): controlled and assigned by IANA¡ Registered ports (1024-49151): IANA registers and lists use of

ports as a convenience¡ Dynamic ports (49152-65535): ephemeral ports

n Port numbers are 16-bit integers (0~65,535)¡ Server use well know ports¡ Clients use ephemeral (short-lived) ports

n For well-known port numbers, see /etc/services on a UNIX or Linux machine

Addressing (Continued)n Well-known Ports

Socket Addressing

n Process-to-process delivery needs two identifiers¡ IP address and Port number¡ Combination of IP address and port number is called a socket

address (a socket is a communication endpoint)¡ Client socket address uniquely identifies client process¡ Server socket address uniquely identifies server process

n Transport-layer protocol needs a pair of socket addresses¡ Client socket address¡ Server socket address¡ For example, socket pair for a TCP connection is a 4-tuple

n Local IP address, local port, andn foreign IP address, foreign port

n A connection is identified by a pair of sockets

Addressing (Continued)

n Identify the address of a service by a client¡ Know address ahead of time

n Well-known addresses

¡ DNS server

n The server will look up the directory and returns an address

n Sending process requests to well known address¡ Useful in cases where the target user is spawned only at

requested time

¡ The process at the well known address will spawn the new process and return an address

Example How to check ports open in a system

$ netstat -aUDP Local Address State-------------------- ------- *.tacacs Idlenoya.syslog Idle *.35071 Idle *.35072 Idle *.64019 Idle *.* UnboundTCP Local Address Remote Address Swind Send-Q Rwind Recv-Q State-------------------- -------------------- ----- ------ ----- ------ ------- *.* *.* 0 0 0 0 IDLE *.ftp *.* 0 0 0 0 LISTEN *.telnet *.* 0 0 0 0 LISTEN *.pop3 *.* 0 0 0 0 LISTEN *.smtp *.* 0 0 0 0 LISTEN noya.43828 202.96.44.10.smtp 17520 0 8760 0 ESTABLISHED noya.smtp public.bta.net.cn.45110 8760 0 8760 0 ESTABLISD noya.44176 smtpott2.nortel.ca.smtp 0 0 8760 0 SYN_SENT noya.44174 iad.xjtu.edu.cn.smtp 61440 0 9216 0 TIME_WAIT noya.smtp 202.96.125.104.45736 8760 0 8760 0 TIME_WAIT noya.pop3 202.99.61.199.1081 8555 0 8760 0 TIME_WAIT

4

Agenda

n TCP/IP protocol stack

n Principles behind transport layer services¡ Addressing

¡ Multiplexing

n Transport layer protocols in the Internet¡ UDP: connectionless transport

¡ TCP: connection-oriented transport

¡ MPTCP: Multiple TCP

¡ QUIC: Quick UDP Internet Connections

Multiplexing/demultiplexing

application

transport

network

link

physical

P1 application

transport

network

link

physical

application

transport

network

link

physical

P2P3 P4P1

host 1 host 2 host 3

= process= socket

delivering received segmentsto correct socket

Demultiplexing at rcv host:gathering data from multiplesockets, enveloping data with header (later used for demultiplexing)

Multiplexing at send host:

Multiplexing/demultiplexing (Continued)

n Upward multiplexing¡ Multiple transport connections shares a single network

connection

Multiplexing/demultiplexing (Continued)

n Downward or inverse multiplexing¡ A single transport connection utilizes the combined services of

multiple network connections

Multiplexing/demultiplexing (Continued)

n How demultiplexing works ¡ host receives IP datagrams

n each datagram has source IP address, destination IP address

n each datagram carries 1 transport-layer segment

n each segment has source, destination port number (recall: well-known port numbers for specific applications)

¡ host uses IP addresses & port numbers to direct segment to appropriate socket

source port # dest port #

32 bits

applicationdata

(message)

other header fields

TCP/UDP segment format

Connection-oriented Demultiplexing

n TCP socket identified by 4-tuple:

¡ (source IP address, source port number, dest IP address, dest port number)

n Recv host uses all four values to direct segment to appropriate socket

n Server host may support many simultaneous TCP sockets:

¡ each socket identified by its own 4-tuple

n Web servers have different sockets for each connecting client

¡ non-persistent HTTP will have different socket for each request

5

Connectionless Demultiplexing

n UDP socket identified by two-tuple:

¡ (dest IP address, dest port number)

n When host receives UDP segment:

¡ checks destination port number in segment

¡ directs UDP segment to socket with that port number

n IP datagrams with different source IP addresses and/or source port numbers directed to same socket

Agenda

n TCP/IP protocol stack

n Principles behind transport layer services¡ Addressing

¡ Multiplexing

n Transport layer protocols in the Internet¡ UDP: connectionless transport

¡ TCP: connection-oriented transport

¡ MPTCP: Multiple TCP

¡ QUIC: Quick UDP Internet Connections

Transport layer protocols

n Connection oriented¡ TCP

¡ XTP

¡ SCTP

¡ MPTCP

¡ ...

n Connectionless¡ UDP

¡ DCCP

¡ QUIC

n Research on a better transport layer protocol still going on in the community ...

UDP: User Datagram Protocol [RFC 768]

n UDP is a transport protocol¡ communication between

processes

n “no frills,” “bare bones” Internet transport protocol

n “best effort” service, UDP segments may be:¡ lost

¡ delivered out of order to app

n connectionless:¡ no handshaking between UDP

sender, receiver

¡ each UDP segment handled independently of others

Why is there a UDP?

n no connection establishment (which can add delay)

n simple: no connection state at sender, receiver

n small segment header

n no congestion control: UDP can blast away as fast as desired

Why not IP directly?

n Multiplexing and demultiplexing

n Error checking on data

UDP (Continued)

n Often used for streaming multimedia apps

¡ Loss tolerant

¡ Rate sensitive

n Other UDP uses

¡ DNS

¡ SNMP

¡ TFTP

¡ …n Reliable transfer over UDP: add reliability at

application layer

¡ application-specific error recovery!

UDP Concepts

n UDP Datagram Format

n UDP Application

6

UDP Datagram Format

n Normally, each UDP corresponds to one IP datagram

n IP fragmentation if the IP datagram size is larger than the MTU (Max. Transmission Unit)

n UDP encapsulation in an IP datagram

UDP Datagram Format

n Datagram Delivery

n Connectionless

n Unreliable

n Low overheadsSource Port Destination Port

Length Checksum

DataLength, in

bytes of UDP segment, including

header

UDP Datagram Format (Continued)

n Port number:¡ Identify the service access point on each host

¡ Identify the final destination of a datagram, the source must provide the destination IP address + destination port number

¡ Sender must also provide the source port number to which replies can be addresses

¡ Used for (de)multiplexing.

¡ Client ports are ephemeral (short-lived). Server ports are “well known”.

n UDP length¡ Length of UDP header (8 bytes) + UDP data

UDP Datagram Format (Continued)

n Checksum¡ UDP checksum similar to IP header checksum, but includes a pseudo-

header (to help check source/destination).

n Source and destination IP address

¡ Allow receiver to double-check the data has arrived at the correct destination

n Protocol: has value 17 for UDP

n UDP length

n UDP checksum optional, but RFC 1122/23 (host reqts) requires it to be enabled

n Covered the whole datagram (including data)

UDP Servers

n Client-Server architecture: basis for most distributed applications today (eg echo/UDP streaming/NTP)

n Most UDP servers are “iterative” => a single server process receives and handles incoming requests on a “well-known” port.

n Can filter client requests based on incoming IP address, client IP address, incoming port address, or wild card filters

n Queues to hold requests if server busy

Typical applications for UDPn SNMP, network management

n Audio and Video broadcast/Multicast

n Small services

¡ daytime information

¡ echo

¡ chargen

n ntp

n IPX over IP

n who

n talk

n routed

n radius

n …

7

Agenda

n TCP/IP protocol stack

n Principles behind transport layer services¡ Addressing

¡ Multiplexing

n Transport layer protocols in the Internet¡ UDP: connectionless transport

¡ TCP: connection-oriented transport

¡ MPTCP: Multiple TCP

¡ QUIC: Quick UDP Internet Connections socketdoor

TCPsend buffer

TCPreceive buffer

socketdoor

segment

applicationwrites data

applicationreads data

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

n full duplex data:¡ bi-directional data flow in

same connection

¡ MSS: maximum segment size

n connection-oriented: ¡ handshaking (exchange of

control msgs) init’s sender, receiver state before data exchange

n flow controlled:¡ sender will not overwhelm

receiver

n Related research

n point-to-point:¡ one sender, one receiver

n reliable, in-order byte steam:¡ no “message boundaries”

n pipelined:¡ TCP congestion and flow

control set window size

n send & receive buffers

RFC793

TCP – Connection State Diagram

+---------+ ---------\ active OPEN | CLOSED | \ ----------- +---------+<---------\ \ create TCB | ^ \ \ snd SYN passive OPEN | | CLOSE \ \ ------------ | | ---------- \ \ create TCB | | delete TCB \ \ V | \ \ +---------+ CLOSE | \ | LISTEN | ---------- | | +---------+ delete TCB | | rcv SYN | | SEND | | ----------- | | ------- | V +---------+ snd SYN,ACK / \ snd SYN +---------+ | |<----------------- ------------------>| | | SYN | rcv SYN | SYN | | RCVD |<-----------------------------------------------| SENT | | | snd ACK | | | |------------------ -------------------| | +---------+ rcv ACK of SYN \ / rcv SYN,ACK +---------+ | -------------- | | ----------- | x | | snd ACK | V V | CLOSE +---------+ | ------- | ESTAB | | snd FIN +---------+ | CLOSE | | rcv FIN V ------- | | ------- +---------+ snd FIN / \ snd ACK +---------+ | FIN |<----------------- ------------------>| CLOSE | | WAIT-1 |------------------ | WAIT | +---------+ rcv FIN \ +---------+ | rcv ACK of FIN ------- | CLOSE | | -------------- snd ACK | ------- | V x V snd FIN V +---------+ +---------+ +---------+ |FINWAIT-2| | CLOSING | | LAST-ACK| +---------+ +---------+ +---------+ | rcv ACK of FIN | rcv ACK of FIN | | rcv FIN -------------- | Timeout=2MSL -------------- | | ------- x V ------------ x V \ snd ACK +---------+delete TCB +---------+ ------------------------>|TIME WAIT|------------------>| CLOSED | +---------+ +---------+

Reliable transmission

n Reliable means that every transmission of data is acknowledged by the receiver. ¡ TCP sends back ACK as it receives data packet

n Notifies the sender of the packet arrival

n Sender retransmits a packet if ACK does not arrive within a certain period

n Packet Re-ordering¡ TCP reassembles out of order packets

n Robust Checksum¡ Used for data integrity check¡ IP provides 16 bit checksum for IP header¡ TCP provides 32 bit checksum for TCP header and TCP data

Connection-Oriented

n Connection oriented means that a virtual connection is established before any user data is transferred.

n Negotiation must take place between communicating nodes before data transfer

¡ Estimates the capability of each node

n If the connection cannot be established - the user program is notified.

n If the connection is ever interrupted - the user program(s) is notified.

Byte Stream

n Stream means that the connection is treated as a stream of bytes.

n Data handled by TCP has no structure.¡ TCP regards data as bit stream¡ TCP splits data from application into multiple packets

n The size of packets are arbitrarily determined by TCPn TCP guesses appropriate size of packets for each communication path

n The user application does not need to package data in individual datagrams (as with UDP).

Application

TCP

Application Data

TCPPacket

TCPPacket

TCPPacket

TCPPacket

TCPPacket

8

Full Duplex

n TCP provides transfer in both directions.¡ Data can be transmitted while receiving¡ Use of “Piggyback”

n Data packets can convey feedback information in the opposite direction

n To the application program these appear as 2 unrelated data streams, although TCP can piggyback control and data communication by providing control information (such as an ACK) along with user data.

A B

Feedback information for stream from A to B

Data stream from A to B

Data stream from B to A

TCP Segments

n The chunk of data that TCP asks IP to deliver is called a TCP segment.

n Each segment contains:

¡ data bytes from the byte stream

¡ control information that identifies the data bytes

TCP Segments (Continued)

n TCP is encapsulated in an IP datagram

n TCP header is normally 20 bytes unless options are present

TCP Segment Structure

source port # dest port #

32 bits

applicationdata

(variable length)

sequence numberacknowledgement number

Receive windowUrg data pnterchecksum

FSRPAUheadlen

notused

Options (variable length)

URG: urgent data (generally not used)

ACK: ACK #valid

PSH: push data now(generally not used)

RST, SYN, FIN:connection estab(setup, teardown

commands)

# bytes rcvr willingto accept

countingby bytes of data(not segments!)

Internetchecksum

(as in UDP)

Header length

n 32 bits

n With a 4-bit field, max. header length = 60 bytes

n Without option, the normal header length is 20 bytes

Control Bits

n 6 bits (from left to right)n URG: Urgent Pointer field significantn ACK: Acknowledgment field significantn PSH: Push Function

¡ Data should be sent without waiting further for more data, and the receiver should deliver the data to the application as soon as possible

n RST: Reset the connectionn SYN: Synchronize sequence numbersn FIN: No more data from sender

9

Urgent Pointer

n This field communicates the current value of the urgent pointer as a positive offset from the sequence number in this segment.

n The urgent pointer points to the sequence number of the octet following the urgent data. This field is only be interpreted in segments with the URG control bit set.

n Used for TCP’s urgent mode

¡ Inform the receiver “urgent data” has been placed in the normal stream of data

¡ Receiver will be in “urgent mode” until it has passed the urgent pointer, then it will go back to “normal mode”

¡ E.g. used for sending interrupt in telnetn Urgent pointer = positive offset from the sequence number field

¡ Give the sequence number of the last byte of the urgent data

TCP Connection Management

n Connection establishment is asymmetric¡ One side put itself in a LISTEN state (server)

¡ One side issues a request for connection (client)

n Three phases:

¡ Connection establishment

¡ Data transfer

¡ Connection termination

TCP Connection Establishment

n 3-way handshake¡ Sender A sends a SYN segment specifying the port

number of the other party, B, the initial sequence number (ISN) that A will use, and MSS (max segment size)

¡ B responds with its own SYN segment containing its ISN. B also acknowledges A’s SYN by ACKing A’s ISN plus one

¡ A acknowledges B’s SYN by ACKing B’s ISN plus one

¡ A connection is established by exchanging 3 packets

TCP Connection Establishment (Continued)

n During connection setup, both sides exchange “MSS” information with TCP option.¡ Largest payload size that TCP can send for this

connection

¡ Usually, MSS is calculated by MTU

IP header TCP header TCP payload

20 bytes 20 bytes Variable

MSS

MTU

MSS – Maximum Segment Size

n Max. segment size

n Does not include TCP headern Default segment size is 536 bytes (576 byte IP datagram - 20

byte IP header – 20 byte TCP header)n In general, can set MSS= outgoing interface’s MTU – 40

bytes¡ E.g. for ethernet, MSS can be set to 1460 bytes

n Announced in connection establishmentn Different MSS possible for forward/reverse pathsn Path MTU restricts size of MSS further.

kind=2(1 byte)

length=2(1 byte)

MSS size(2 byte)

3-Way Handshake for Connection Synchronization

10

TCP Connection Termination (Continued)

n Half Close

¡ A data flow can only be closed in one direction at a time

¡ So, a connection is terminated by exchanging 4 packets

n 2 communicating nodes play a different role

¡ Active close siden Sends first FINn Active close side waits for a while after it sends last

ACK

¡ Passive close siden Sends ACK for FIN and then sends second FINn Passive close side waits for last ACK

TCP Connection Termination

n TCP connection is full-duplex¡ Each direction must shut down independently (graceful close)

n A termination is initiated by issuing a FIN segment¡ For the TCP receiving the FIN

n It will acknowledge the receipt of the FINn It will inform the application that the other end has terminated that direction

of data flown It can still send data

¡ For the sender of FINn Wait for the acknowledgement from the other party

n If acknowledgement does not arrive before timeout, retransmit FINn After receiving the ACK, it will continue to receive data

n The other end will close its connection also by sending a FIN¡ When this FIN is acknowledged, the connection is completely closed

TCP Connection Termination (Continued)

n Example

Flow Control

n Why is flow control needed at the Transport Layer?¡ User of receiving Transport Entity cannot keep up

with the data flow¡ Receiving Transport Entity itself cannot keep up

with the flow of incoming data¡ Result: Buffer overflows in the receiving transport

entityn Flow control at the transport layer is more complex than

that at data link layer¡ Delays are longer

n There will be considerable delay in the communication of flow control information

¡ Delays are highly variablen Difficult to use timeout mechanism for retransmission of lost data

Flow Control (Continued)

n To avoid congestion¡ TCP uses several algorithms like: TCP Reno, TCP

Vegas, FAST TCP, TCP New Reno, and TCP Hybla

¡ A network link will be congested if a sender transmits data at too high a data rate

¡ Usually, network nodes will simply drop incoming packets if congestion arise

¡ Some network nodes will queue up packets for delivery. However, excessive delay can trigger retransmission, thereby making the congestion problem even worst.

Flow Control (Continued)

n To avoid overwhelming slow receivers¡ A sender sending too fast to a flow receiver can

also leads to congestion

¡ Congested receiver will drop incoming packets, wasting network bandwidth resource

n Flow control requires buffering at the sender or receiver

11

TCP Flow Control

n Flow control is done by means of sliding window protocol

TCP Flow Control (Continued)

n Sliding window protocol¡ In TCP, sending acknowledgements is separated from

setting the window sizen Acknowledgements do not automatically increase the window

size, i.e., credit allocation flow control is used

¡ Note:n The size of window can decrease, but the right edge of the

window must not move leftward

n Receiver does not have to wait for the window to fill before sending an acknowledgement

n It can be shown that the window size can be best chosen as

Window size (bits) = bandwidth (bits/sec) x round trip time (sec)

Credit Allocation Flow Control

n Enhance the sliding window protocol by a mechanism that decouples acknowledgements from flow control

n Segments can be acknowledged without granting permission for new transmissions

n Mechanism¡ Set initial window size of receiver during connection setup

¡ Receiver both acknowledges TPDUs and grants credit by sending (ACK N, Credit M)

n Ack N: acknowledges all sequence numbers through N-1

n Credit M: sets the number of credits to M

¡ Sender can send from sequence number N to N+M-1

TCP Congestion Control

Principles of Congestion Control

Congestion:

n informally: “too many sources sending too much data too fast for network to handle”

n different from flow control!

n manifestations:¡ lost packets (buffer overflow at routers)

¡ long delays (queueing in router buffers)

n a top-10 problem!

TCP Congestion Control (Continued)

n Sliding window is imposed by the receiver to avoid buffer overflow at the receiver¡ Problems arise when packets have to travel through routers

and slower link

¡ Router may run out of space and discard packet

¡ Solution: use congestion window to limit the amount of bytes TCP can send

n Congestion control adds another window to the sender’s TCP¡ Congestion window is imposed by the sender

¡ TCP dynamically adjusts the congestion window size according to the network state

¡ Sender can transmit up to the min. of the congestion window and the advertised flow control window

TCP Congestion Control (Continued)

n Maintains three variables:¡ cwnd – congestion window¡ rcv_win – receiver advertised window¡ ssthresh – threshold size (used to update cwnd)

n For sending use: win = min (rcv_win, cwnd)n How does sender perceive congestion?

¡ loss event = timeout or 3 duplicate acks¡ TCP sender reduces rate (cwnd) after loss event

n Three phases¡ Slow start¡ Congestion avoidance¡ Congestion occurrence

12

TCP Congestion Control (Continued)

n Slow Start¡ Discover congestion quickly

n Mechanism¡ Quickly increase cwnd until network congested

n Get a rough estimate of the optimal of cwnd

n Whenever starting traffic on a new connection, or whenever increasing traffic after congestion was experienced:

¡ Set cwnd = 1 (MSS)

¡ Each time a segment is acknowledged increment cwnd by one (cwnd++)

n Slow Start is not actually slow¡ cwnd increases exponentially

Slow Start Example

n The congestion window size grows very rapidly

¡ double cwnd every RTT¡ done by incrementing

cwnd for every ACK received

n Summary: initial rate is slow but ramps up exponentially fast

n TCP slows down the increase of cwnd when cwnd>=ssthresh

Jacobson’s Fast Retransmit and Fast Recovery Algorithm

n Congestion detection is based on timeout and /or three duplicate ACKs

n But duplicate ACK can mean

¡ Loss segment (congestion is severe)

¡ Delayed segment (congestion may not be severe)

n Since we are not sure, reduce cwnd just by a half instead all the way to one

¡ This is known as fast recoveryn Since the duplicate suggests a segment may be lost,

instead of wait until timeout, the suspected segment is retransmitted immediately

¡ This is called fast retransmit

Fast Recovery (After a Fast Retransmit)

n ssthresh = cwnd/2n cwnd = ssthresh

¡ Instead of setting cwnd to 1, cut cwnd in half (multiplicative decrease)

n For each duplicate ack arrival ¡ Dupack ++¡ MaxWindow = min (cwnd + dupack, rcv_win)¡ Indicates packet left network, so we may be able to send

more

n Receive ack for new data (beyond initial duplicate ack)¡ Dupack = 0¡ Exit fast recovery

n But when Retransmission Timeout (RTO) expires still do cwnd =1

Refinement

n After 3 dup ACKs:

¡ cwnd is cut in half

¡ window then grows linearly

n But after timeout event:

¡ cwnd instead set to 1 MSS;

¡ window then grows exponentially

¡ to a threshold, then grows linearly

• 3 dup ACKs indicates network capable of delivering some segments• timeout before 3 dup ACKs is “more alarming”

Philosophy:

TCP Congestion Control (Continued)

n When cwnd is below ssthresh, sender in slow-start phase, window grows exponentially.

n When cwnd is above ssthresh, sender is in congestion-avoidance phase, window grows linearly.

n When a triple duplicate ACK occurs, ssthresh set to cwnd/2 and cwnd set to ssthresh.

n When timeout occurs, ssthresh set to cwnd/2 and cwnd is set to 1 MSS.

Slow start

Congestion avoidance

Fast Recovery 1

Fast Recovery 2

13

TCP on wireless network

n In case of time out, TCP will slow down drastically (congestion window reset to 1)¡ Based on the assumption is that the loss packet is due to

congestion

n A wrong assumption if the link is noisy, e.g. wireless!¡ Loss packets do not indicate congestion

¡ Should retransmit as quick as possible

¡ A design that is good for wired network (now noise) is bad for wireless network (noisy)

TCP on Wireless Network (Continued)

n Solution 1: indirect TCP¡ Split the TCP connection into two separate links

¡ The configuration is similar to proxy

TCP on Wireless Network (Continued)

n Solution 2: snooping agent ¡ Add a snooping agent at the base station

¡ Cache every segment transmitted, and note the ACK reply

¡ If timeout, retransmit the segments without letting the TCP knows

¡ Duplicate acknowledgements are discarded otherwise the sender might misinterpret them as congestion

¡ Advantage: transparent to TCP

¡ Disadvantage: the sender may timeout and start congestion avoidance, solution 1 doesn’t have this problem

Tools for TCP

n tcpdump¡ Most common packet capture tool ¡ http://www.tcpdump.org/

n ethereal¡ A more graphical tool compare to tcpdump¡ http://www.ethereal.com/

n tcptrace¡ Used to analyze output of various packet capture tools

(tcpdump, snoop, …)n tcpillust

¡ Visualize tcp connection interaction¡ http://web.sfc.wide.ad.jp/~nishida/tcpillust.html

Agenda

n TCP/IP protocol stack

n Principles behind transport layer services¡ Addressing

¡ Multiplexing

n Transport layer protocols in the Internet¡ UDP: connectionless transport

¡ TCP: connection-oriented transport

¡ MPTCP: Multiple TCP

¡ QUIC: Quick UDP Internet Connections

Multipath TCP (MPTCP)

n Aim at allowing a Transmission Control Protocol (TCP) connection to use multiple paths to maximize resource usage and increase redundancy.

n For multi-homing hosts, links may be added or dropped as the user moves in or out of coverage without disrupting the end-to-end TCP connection.

n The redundancy offered by Multipath TCP enables inverse multiplexing of resources, and thus increases TCP throughput to the sum of all available link-level channels instead of using a single one as required by plain TCP. Multipath TCP is backward compatible with plain TCP.

n RFC 6897

14

MPTCP Protocol Stack

n Multipath TCP presents the same socket interface as TCP.

n This implies that any standard TCP application can be used above Multipath TCP while in fact spreading data across several subflows.

Control Establishment

Establishment of an Additional Subflow Data Sending and Receiving

n MPTCP will choose any of the path to send the data.

n The regular TCP sequence number ensures that data is received in order over each subflow and allows losses to be detected.

n The MPTCP DSS option contains a data sequence number and an acknowledgement number. These allow receiving data from multiple subflows in the original order, without any corruption.

Use case

n MPTCP on Smartphones

n Smartphones are equipped with Wi-Fi and 3G/4G interfaces, but they typically use only one interface at a time. Still, users expect their TCP connections to survive when their smartphone switches from one wireless network to another.

n MPTCP enables seamless handovers from Wi-Fi to 3G/4G and vice versa

Agenda

n TCP/IP protocol stack

n Principles behind transport layer services¡ Addressing

¡ Multiplexing

n Transport layer protocols in the Internet¡ UDP: connectionless transport

¡ TCP: connection-oriented transport

¡ MPTCP: Multiple TCP

¡ QUIC: Quick UDP Internet Connections

15

QUIC (Quick UDP Internet Connections)

n Designed by Google to speed up the web.

n ITEF draft.

n Combine the best of UDP and TCP¡ A reliable, multiplexed transport over UDP

¡ Always encrypted

¡ Reduces latency

¡ Runs in user-space

¡ Open sourced in Chromium

n Google recently revealed that they have been experimenting with QUIC and currently half the requests from Google Chrome to Google servers are sent via QUIC.

Protocol stack

https://www.nanog.org/sites/default/files//meetings/NANOG64/1051/20150603_Rogan_Quic_Next_Generation_v1.pdf

Zero-RTT connection establishment TCP head-of-line blocking

n Head-of-line blocking: In TCP, packets need to be processed in the correct order. If a packet is lost on its way to/from the server, it needs to be retransmitted. The TCP connection needs to wait (or "block") on that TCP packet before it can continue to parse the other packets, because the order in which TCP packets are processed matters.

n In QUIC, this is solved by not making use of TCP anymore. UDP is not dependent on the order in which packets are received. While it's still possible for packets to get lost during transit, they will only impact an individual resource (as in: a single CSS/JS file) and not block the entire connection.

Forward Error Correction: preventing failure

n Every packet that gets sent also includes enough data of the other packets so that a missing packet can be reconstructed without having to retransmit it.

n This is essentially RAID 5 on the network level.

n The current ratio seems to be around 10 packets. So for every 10 UDP packets sent, there is enough data to reconstruct a missing packet. A 10% overhead, if you will.

Performance

https://www.nanog.org/sites/default/files//meetings/NANOG64/1051/20150603_Rogan_Quic_Next_Generation_v1.pdf

16

谢谢 !