1 chapter 5 protocol underlying http web protocols and practice
TRANSCRIPT
1
Chapter 5
Protocol Underlying
HTTP
Web Protocols and Practice
2
Topics
Web Protocols and Practice
PROTOCOLS UNDERLYING HTTP
Protocol Definition Domain Name System Application-Layer Protocols Internet Protocol Transmission Control Protocol
3
Protocol Definition
Web Protocols and Practice
A protocol defines both the syntax and semantics of the message exchanged between senders and receivers.
The protocol suite for the Internet consists of four main layers:
Link layer» Handles the hardware details of interfacing with the
physical communication medium, such as Ethernet, Asynchronous Transfer Mode (ATM), or Synchronous Optical Network (SONET).
PROTOCOLS UNDERLYING HTTP
4
Protocol Definition
Web Protocols and Practice
Network layer» Handles the delivery of individual packets of data
through the network. A network-layer protocol is implemented in routers and the end hosts.
Transport layer» Coordinates the communication between hosts on
behalf of the application layer. In practice, a transport layer protocol is typically implemented in the operating system of the end host.
Application layer» Handles the details of specific applications. In
practice, an application-layer protocol is typically implemented as part of the application software, such as a Web browser or Web server.
PROTOCOLS UNDERLYING HTTP
5
Protocol Definition
Web Protocols and Practice
Figure 5.1 illustrates layering of protocols.
PROTOCOLS UNDERLYING HTTP
6
Protocol Definition
Web Protocols and Practice
DNS Telnet FTP SMTP NNTP HTTP
UDP TCP
IP
ATM SONET Ethernet
Application layer
Transport layer
Network layer
Link layer
Figure 5.1. Layering of protocols
PROTOCOLS UNDERLYING HTTP
7
Protocol Definition
Web Protocols and Practice
The three main protocols involved in the transfer of HTTP messages are:
Internet Protocol (IP)» Is a network-layer protocol that coordinates the
delivery of individual packets from one host to another, based on the IP address of the destination host.
Transmission Control Protocol (TCP)» Is a transport-layer protocol that coordinates the
transmission of IP packets in order to provide the abstraction of a reliable, bidirectional connection between two communicating applications.
» Some applications use User Datagram Protocol (UDP).
PROTOCOLS UNDERLYING HTTP
8
Protocol Definition
Web Protocols and Practice
Domain Name System (DNS)» Is an application-layer protocol that controls the
translation of hostnames into IP addresses, and vice versa.
PROTOCOLS UNDERLYING HTTP
9
Domain Name System
Web Protocols and Practice
Domain Name System Definition DNS Resolver DNS Architecture DNS Protocol DNS Queries and the Web
PROTOCOLS UNDERLYING HTTP
10
Domain Name System Definition
Web Protocols and Practice
The Domain Name System (DNS) coordinates the translation of hostnames to IP addresses and IP addresses into hostnames.
Machines on the Internet have hostnames because:
Remembering a hostname is much easier than remembering an IP address.
The IP address associated with a hostname may change over time.
PROTOCOLS UNDERLYING HTTP
11
DNS Resolver
Web Protocols and Practice
A software library that is linked with the Internet applications is named resolver.
A DNS resolver performs two main functions: Gethostbyname()
» The function converts a hostname to an IP address.
Gethostbyaddr()» The function converts an IP address to a hostname.
The resolver interacts with one or more DNS servers to perform these functions on behalf of the application.
PROTOCOLS UNDERLYING HTTP
12
DNS Architecture
Web Protocols and Practice
In the early days, a single master file listed the IP addresses associated with each hostname.
Now DNS is a distributed database that consists of a hierarchical set of name servers, each responsible for a portion of the domain names and address space.
The DNS architecture reflects the hierarchy of hostnames and IP addresses.
(Figure5.10)
PROTOCOLS UNDERLYING HTTP
13
DNS Architecture
Web Protocols and Practice
ac
camwest east
bar
www ftp
com edu org ac uk zw
users
57
34
56
12
In-addr
arpa
Unnamed root
Top-level domains
Second-level domains
Generic domains
Country domains
www.west.bar.com ftp.east.bar.com user.cam.ac.uk
Figure 5.10. DNS hierarchy
PROTOCOLS UNDERLYING HTTP
14
DNS Architecture
Web Protocols and Practice
The top level includes the three-character generic or organizational domains and two-character country domains.
The top level domains are handled by a collection of root servers.
The hierarchy of domain names does not correspond to the hierarchical structure of IP addresses.
Efficient mapping of IP addressing to hostnames requires a separate hierarchy based on IP addresses.
PROTOCOLS UNDERLYING HTTP
15
DNS Protocol
Web Protocols and Practice
The DNS protocol governs communication between a DNS client and a DNS server.
A DNS client sends a query for information (e.g. ,the IP address associated with a particular hostname) to a DNS server, and the DNS server returns a response with the requested information (e.g., the IP address).
PROTOCOLS UNDERLYING HTTP
16
DNS Protocol
Web Protocols and Practice
DNS queries can be recursive or iterative. A recursive query requests that the receiving DNS
server resolve the entire query itself. An iterative query requests that the receiving DNS
server respond directly to the DNS client with the IP address of the next DNS server in the DNS hierarchy. Root servers handle only iterative queries.
(Figure 5.11)
PROTOCOLS UNDERLYING HTTP
17
DNS Protocol
Web Protocols and Practice
Web browser
DNS resolver
1 10
Client host
DNS cache
Local DNS server
DNS query2
DNS response
9
Local area network
3
4
5
67
8
Root server
Top-level domain server
Second-level domain server
Figure 5.11. DNS resolver and local DNS server
PROTOCOLS UNDERLYING HTTP
18
DNS Protocol
Web Protocols and Practice
Figure 5.11 shows that for a recursive query: The resolver is invoked by a system call from the
application (step 1). Then the resolver sends a DNS query to the local
DNS server (step 2). Then the resolver waits for the reply (step 9). The resolver provides the IP address to the
application (step 10).
PROTOCOLS UNDERLYING HTTP
19
DNS Protocol
Web Protocols and Practice
Figure 5.11 shows that for an iterative query: The resolver is invoked by a system call from the
application (step 1). Then the resolver sends a DNS query to the local
DNS server (step 2). The local DNS server sends a query to the root
DNS server (step 3). The local DNS server learns the names and IP
addresses of the DNS servers for the zone at the next level (step 4).
PROTOCOLS UNDERLYING HTTP
20
DNS Protocol
Web Protocols and Practice
Then the local DNS server can send a query to the next DNS server in the chain (steps 5,6,7,8)
Ultimately, the local DNS server responds to the resolver (step 9).
The resolver provides the IP address to the application (step 10).
PROTOCOLS UNDERLYING HTTP
21
DNS Protocol
Web Protocols and Practice
DNS servers employ caching To reduce the latency in responding to queries To reduce the amount of DNS traffic in the
Internet
DNS primarily uses UDP for sending queries and responses, although TCP may also be used.
PROTOCOLS UNDERLYING HTTP
22
DNS Queries and the Web
Web Protocols and Practice
A Web client performs a gethostbyname() query before establishing a transport connection to the Web server. In some cases, the client may not need to perform a DNS lookup:
Request directed to a proxy Request satisfied by the client cache Using the result of the previous query
Although the Web client needs to learn the IP address of the Web server, the Web server knows the IP address of the client when receiving a request because the client’s IP address is included the header of each IP packet.
PROTOCOLS UNDERLYING HTTP
23
DNS Queries and the Web
Web Protocols and Practice
The mapping of the Web client’s IP address to a hostname is controlled by the DNS servers at the Web client institution.
Mapping the client’s IP address into a hostname often incurs significant delay.
In addition, the DNS queries consume resources at the Web server.
PROTOCOLS UNDERLYING HTTP
24
Application-Layer Protocols
Web Protocols and Practice
Application-Layer Protocols Definition Telnet Protocol File Transfer Protocol Simple Mail Transfer Protocol Network News Transfer Protocol Properties of Application-Layer Protocols
PROTOCOLS UNDERLYING HTTP
25
Application-Layer Protocols Definition
Web Protocols and Practice
Applications execute on end hosts and communicate via application-level protocols.
An application-layer protocol defines both the syntax and the semantics of the messages exchanged between the end systems.
Four key internet applications are: Telnet File transfer E-mail Network news
PROTOCOLS UNDERLYING HTTP
26
Telnet Protocol
Web Protocols and Practice
Telnet permits a user to connect to an account on a remote machine.
A client program running on the user’s machine communicates using the Telnet protocol with a server program running on the remote machine.
The Telnet client program performs two important functions:
Interacting with the user terminal on the local host Exchanging messages with the Telnet server
PROTOCOLS UNDERLYING HTTP
27
File Transfer Protocol
Web Protocols and Practice
FTP allows a user to copy files to and from a remote machine.
The client program sends commands to the server program to coordinate the copying of files between the two machines on behalf of the user.
FTP uses separate TCP connections for control and data.
PROTOCOLS UNDERLYING HTTP
28
Simple Mail Transfer Protocol
Web Protocols and Practice
SMTP supports the transfer of e-mail. SMTP is used to send an e-mail message from a
local mail server to a remote mail server. SMTP is used to send an e-mail message from
the user’s mail agent to the local mail server. The separation of functionality between the user
agent and the mail server is valuable: The mail agent provides rich features for a single user. The mail server provides reliable service for multiple users.
FTP and SMTP are text oriented and command based.
PROTOCOLS UNDERLYING HTTP
29
Simple Mail Transfer Protocol
Web Protocols and Practice
The communication between the two servers starts with a greeting message from the remote mail server. Then the local mail server issues commands to transfer the e-mail message.
A typical exchange involves separate commands to
Identify the local mail server Identify the sender of the e-mail message Identify each recipient of the e-mail message Send the actual e-mail message
PROTOCOLS UNDERLYING HTTP
30
Simple Mail Transfer Protocol
Web Protocols and Practice
In contrast to FTP, SMTP uses a single TCP connection for both
The command reply exchanges The transfer of the e-mail message
In addition to transferring the message between mail servers, delivering an e-mail message requires two additional steps involving the mail agent:
The transfer of the message to the local mail server The reception of the message from the remote mail server
PROTOCOLS UNDERLYING HTTP
31
Network News Transfer Protocol
Web Protocols and Practice
NNTP supports the transfer of articles associated with electronic news groups.
A user agent uses NNTP to communicate with a local news server, which uses NNTP to communicate with a central repository of news article.
The key idea is to store the messages in a central database instead of having a separate copy in each subscriber’s mailbox.
PROTOCOLS UNDERLYING HTTP
32
Network News Transfer Protocol
Web Protocols and Practice
The database consists of a collection of newsgroups, each associated with an ordered list of messages.
An article includes header lines such as: E-mail address of the person who posted the
article Subject matter of the article Date/time when the article was generated Number of lines of text in the article Unique message identifier for the article List of newsgroups receiving with the article
PROTOCOLS UNDERLYING HTTP
33
Network News Transfer Protocol
Web Protocols and Practice
NNTP coordinates the transfer of messages between the local news server and the central repository.
NNTP may also be used between the user agent and the local news server.
PROTOCOLS UNDERLYING HTTP
34
Properties of Application-Layer Protocols
Web Protocols and Practice
Telnet, FTP, SMTP, and NNTP have important similarities and differences, as follows:
Command/reply» Telnet clients and servers send commands in binary
format.» FTP, SMTP, NNTP commands are text-based and
are sent by the client. The commands have a well-defined, fixed format, and the server responds with a three-digit reply code and an optional text message.
Data types » Telnet, FTP, SMTP, and NNTP transmit textual data
in the standard U.S. 7-bit ASCII format.» FTP also supports the transfer of data in binary form.
PROTOCOLS UNDERLYING HTTP
35
Properties of Application-Layer Protocols
Web Protocols and Practice
Transport» All four protocols rely on a reliable transport protocol,
typically TCP.» Telnet, SMTP, and NNTP use a single TCP
connection for transmitting commands/replies and data.
» FTP uses separate connections for control and data.
Directionality » FTP and NNTP can transfer data in both directions-
copying data from the client and retrieving files from the server.
» SMTP is used to transmit e-mail messages from the client to the server.
PROTOCOLS UNDERLYING HTTP
36
Properties of Application-Layer Protocols
Web Protocols and Practice
Statefulness » Under all four protocols, the server retains information
about the session with the client.
PROTOCOLS UNDERLYING HTTP
37
Internet Protocol
Web Protocols and Practice
The Internet protocol (IP) is the network-level protocol underlying the Internet, a collection of interconnected networks spanning the globe.
IP provides a framework for sending individual packets. In traveling from the sending host to the receiving host, a packet traverses a collection of routers that communicate via IP.
(Figure 5.2)
PROTOCOLS UNDERLYING HTTP
38
Internet Protocol
Web Protocols and Practice
HTTP
TCP
IP
Ethernet
interface
HTTP
TCP
IP
Ethernet
interface
IP
Ethernet
interface
SONET
interface
IP
Ethernet
interface
SONET
interface
Web client Web server
EthernetEthernet SONET link
HTTP message
TCP segmentRoute
rRoute
rIP packet IP packet IP packet
Figure 5.2. Protocols involved in transferring HTTP messages
PROTOCOLS UNDERLYING HTTP
39
Internet Protocol
Web Protocols and Practice
The routers in the Internet treat each packet independently and do not need to retain state across successive packets.
A sequence of IP packets traveling from one host to another may not traverse the same path through the network.
Packets may be lost, corrupted, or delivered out of order.
The model of the Internet is referred to as packet switching.
PROTOCOLS UNDERLYING HTTP
40
Internet Protocol
Web Protocols and Practice
Internet hosts are identified by numerical addresses (IP addresses).
An IP address can be divided into a network part and a host part.
Once the packet reaches the destination network, the host portion of the address is used to direct the packet to the appropriate destination machine.
IP addresses are allocated in five classes.
(As discussed before in socket programming)
PROTOCOLS UNDERLYING HTTP
41
Internet Protocol
Web Protocols and Practice
Each IP packet has a header. The fields of the IP header are set by operating
system on the sending machine and are important for successful communication between the sender and receiver:
Version number (4 bits) Header length (4 bits) Type of service (8 bits) Total length (16 bits) Identification (16 bits) IP flags (3 bits)
PROTOCOLS UNDERLYING HTTP
42
Internet Protocol
Web Protocols and Practice
Fragment offset (13 bits) Time-to-live (8 bits) Protocol (8 bits) Header checksum (16 bits) Source IP address (32 bits) Destination IP address (32 bits) IP options (variable length)
(Figure 5.4)
PROTOCOLS UNDERLYING HTTP
43
Internet Protocol
Ver#hdr
len
Type of service
total length
identificationflagsfragment offset
Web Protocols and Practice
time to liveprotocolHeader checksum
Figure 5.4. Format of an IP packet
Source IP address
destination IP address
Options (0 or more)
data
0 4 8 16 20 32
IP h
ead
er20
b
ytes
PROTOCOLS UNDERLYING HTTP
44
Transmission Control Protocol
Web Protocols and Practice
Fallowing topics will be discussed: Transmission Control Protocol Definition Opening and Closing a TCP Connection Sliding-Window Flow Control Retransmission of Lost Packets
PROTOCOLS UNDERLYING HTTP
45
Transmission Control Protocol Definition
Web Protocols and Practice
The Transmission Control Protocol (TCP) coordinates the transmission of data between a pair of applications.
Applications communicate by reading from and writing to a socket that presents data as an ordered, reliable stream of bytes.
The TCP sender divides data into segments and transmits each segment in an IP packet along with a TCP header.
PROTOCOLS UNDERLYING HTTP
46
Transmission Control Protocol Definition
Web Protocols and Practice
The TCP header includes information necessary to coordinate the ordered, reliable delivery of segments.
The sending and receiving applications should be allowed to assume that they communicate over a channel that provides an ordered, reliable byte stream. IP does not provide this service. Instead, this abstraction is provided by TCP.
PROTOCOLS UNDERLYING HTTP
47
Opening and Closing a TCP Definition
Web Protocols and Practice
The SYN, ACK, FIN, and RST flags in the TCP header are used in opening and closing a TCP connection.
Establishing a TCP connection between two applications, A and B, involves a three-way handshake.
» SYN from A to B» SYN-ACK from B to A» ACK from A to B (Figure 5.5)
PROTOCOLS UNDERLYING HTTP
48
Opening and Closing a TCP Connection
Web Protocols and Practice
Termination a TCP connection between two applications, A and B, involves a four-way handshake.
» FIN from B to A» ACK from A to B» FIN from A to B» ACK from B to A (Figure 5.5)
PROTOCOLS UNDERLYING HTTP
49
Opening and Closing a TCP Connection
Web Protocols and Practice
AC
K
DA
TA
SY
N
AC
K
AC
K
FIN
SY
N-A
CK
AC
K
DA
TA
DA
TA
FIN
AC
K
Figure 5.5. Timeline of a TCP connection
A
B
PROTOCOLS UNDERLYING HTTP
50
Sliding-Window Flow Control
Web Protocols and Practice
The TCP sender limits the transmission of data to avoid overflowing the buffer space at the receiver for two reasons:
The sender should not transmit more data than the receiver can store in its buffers.
The sender should not transmit data more quickly than the network can handle.
Each TCP sender limits the number of unacknowledged bytes in the network, using sliding-window flow control.
PROTOCOLS UNDERLYING HTTP
51
Sliding-Window Flow Control
Web Protocols and Practice
To avoid overflow of the buffer at the receiver, packets from B to A include the receiver window in the TCP header.
PROTOCOLS UNDERLYING HTTP
52
Retransmission of Lost Packets
Web Protocols and Practice
The retransmission of lost packets plays a crucial role in how TCP provides reliable delivery of a stream of bytes.
The sender infers that a packet has been lost in two ways:
A retransmission timeout (RTO) Duplicate acknowledgement
PROTOCOLS UNDERLYING HTTP
53
Retransmission of Lost Packets
Web Protocols and Practice
Selecting an appropriate value for the RTO is a delicate process:
Setting RTO too low results in a false alarm, and the sender unnecessarily transmits a packet that was not actually lost.
Setting RTO too high postpones the detection of a lost packet, resulting in unnecessary delay in retransmitting the packet.
The right value for RTO depends on: The distance between the sender and receiver The network congestion
PROTOCOLS UNDERLYING HTTP
54
Retransmission of Lost Packets
Web Protocols and Practice
The time between transmission of a packet and receipt of the acknowledgement is called Round Trip Time (RTT).
The RTO is set to the average RTT plus an additive factor.
In some cases, the sender can infer that a packet has been lost without waiting for the retransmission timer to expire.
PROTOCOLS UNDERLYING HTTP