hypertext transfer protocol - ipp.ptave.dee.isep.ipp.pt/~jml/ingre/priv/slides/http.pdf ·...

Post on 18-Aug-2020

21 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

1

HTTP HyperText Transfer Protocol

Miguel Leitão, 2012

2

HTTP

• HTTP is the protocol that supports

communication between Web browsers and Web

servers.

• From the RFC: “HTTP is an application-level

protocol with the lightness and speed necessary

for distributed, hypermedia information systems.”

• The HTTP communication generally takes place

over a TCP connection, but the protocol itself is

not dependent on a specific transport layer

2

3

HTTP Transaction

Client Browser Web Server

TCP

Connect

HTTP

transaction

4

Request - Response

• HTTP has a simple structure:

– client sends a request

– server returns a reply.

• HTTP can support multiple request-reply

exchanges over a single TCP connection.

3

5

Well Known Address

• A “Web Server” is a HTTP server

• The “well known” TCP port for HTTP

servers is port 80.

• Other ports can be used as well...

6

HTTP Versions

• The original version is known as “HTTP Version 0.9”

– HTTP/0.9 was used for many years.

• Starting with HTTP 1.0 the version number is part of

every request.

• HTTP is still changing...

4

7

HTTP 1.x Request

• Lines of text (ASCII).

• Lines end with CRLF “\r\n”

• First line is called “Request-Line”

Request-Line

Headers . . .

Content...

8

Request Line

Method URI HTTP-Version \r\n

• The request line contains 3 tokens (words).

• space characters “ “ separate the tokens.

• Newline (\n) seems to work by itself (but the protocol requires CRLF)

5

9

Request Method

The Request Method can be:

GET HEAD PUT

POST DELETE TRACE

OPTIONS

future expansion is supported

10

Methods

• GET: retrieve information identified by the URI.

• HEAD: retrieve meta-information about the URI.

• POST: send information to a URI and retrieve result.

GET, HEAD and POST are supported everywhere.

6

11

Methods (other)

• PUT: Store information in location named by URI.

• DELETE: remove entity identified by URI.

• TRACE: used to trace HTTP forwarding through

proxies, tunnels, etc.

• OPTIONS: used to determine the capabilities of

the server, or characteristics of a named resource.

12

URI: Universal Resource Identifier

• URIs defined in RFC 2396.

• Absolute URI: scheme://hostname[:port]/path http://www.cs.rpi.edu:80/blah/foo

• Relative URI: /path

/blah/foo

/absolute/path/to/resource.txt

relative/path/to/resource.txt

No server mentioned

7

13

URI Usage

• When dealing with a HTTP 1.1 server, only a path is used (no scheme or hostname). – HTTP 1.1 servers are required to be capable of

handling an absolute URI, but there are still some out there that won’t…

• When dealing with a proxy HTTP server, an absolute URI is used. – client has to tell the proxy where to get the

document!

14

HTTP Version Number

“HTTP/1.0” or “HTTP/1.1”

HTTP 0.9 did not include a version

number in a request line.

If a server gets a request line with no

HTTP version number, it assumes 0.9

8

15

The Header Lines

• After the Request-Line come a number

(possibly zero) of HTTP headers.

• Each header line contains an attribute

name followed by a “:” followed by the

attribute value.

16

Headers

• Request Headers provide information to

the server about the client

– what kind of client

– what kind of content will be accepted

– who is making the request

– Web site

• There can be 0 headers

9

17

Example HTTP Headers

Accept: text/html

From: neytmann@cybersurg.com

User-Agent: Mozilla/4.0

Referer: http://foo.com/blah

18

End of the Headers

• Each header ends with a CRLF

• The end of the header section is

marked with a blank line. – just CRLF

• For GET and HEAD requests, the end of the headers is the end of the request!

10

19

Post

• A POST request includes some content (some

data) after the headers (after the blank line).

• There is no format for the data (just raw bytes).

• A POST request must include a

Content-Length line in the headers:

– Content-Length: 267

20

Example GET Request

GET /~hollingd/testanswers.html HTTP/1.0

Accept: */*

User-Agent: Internet Explorer

From: cheater@cheaters.org

Referer: http://foo.com/

There is a blank line here!

11

21

Example POST Request

POST /~hollingd/changegrade.cgi HTTP/1.1

Accept: */*

User-Agent: SecretAgent V2.3

Content-length: 35

Referer: http://monte.cs.rpi.edu/blah

stuid=6660182722&item=test1&grade=99

22

HTTP Response

• ASCII Status Line

• Headers Section

• Content can be anything (not just text) – typically is HTML document or some kind of image.

Status-Line

Headers . . .

Content...

12

23

Response Status Line

HTTP-Version Status-Code Message

• Status Code is 3 digit number (for computers)

• Message is text (for humans)

Status Codes 1xx Informational

2xx Success

3xx Redirection

4xx Client Error

5xx Server Error

HTTP/1.0 200 OK

HTTP/1.0 301 Moved Permanently

HTTP/1.0 400 Bad Request

HTTP/1.0 500 Internal Server Error

Examples

24

Response Headers

Provide information about the returned entity (document).

– what kind of document

– how big the document is

– how the document is encoded

– when the document was last modified

Example Date: Wed, 30 Jan 2002 12:48:17 EST

Server: Apache/1.17

Content-Type: text/html

Content-Length: 1756

Content-Encoding: gzip

13

25

Response Header Examples

Date: Wed, 30 Jan 2002 12:48:17 EST

Server: Apache/1.17

Content-Type: text/html

Content-Length: 1756

Content-Encoding: gzip

26

Content

• Content can be anything (sequence of

raw bytes).

• Content-Length header is required for

any response that includes content.

• Content-Type header also required.

14

27

Try it with telnet

> telnet www.dee.isep.ipp.pt 80

GET / HTTP/1.0

HTTP/1.0 200 OK

Server: Apache

...

28

Single Request/Reply

• The client sends a complete request.

• The server sends back the entire reply.

• The server closes it’s socket.

• If the client needs another document it

must open a new connection.

15

29

Persistent Connections

• HTTP 1.1 supports persistent

connections (this is supposed to be the

default).

• Multiple requests can be handled.

• Most servers seem to close the

connection after the first response…

30

Virtual Hosts

• HTTP 1.1 can use virtual hosts.

– Allows multiple hosts to share a single server.

– Each host has a different name.

– The name of the destination host is given as

part of the page request.

16

31

HTTP 1.1 Head Request

V $ telnet linuxzoo.net 80

HEAD / HTTP/1.1

Host: tiger.net

HTTP/1.1 200 OK

Date: Mon, 01 Nov 2008 15:06:44 GMT

Server: Apache/2.0.46 (Red Hat)

Last-Modified: Fri, 29 Oct 2008 14:47:22 GMT

ETag: "4981dd-920-22ea7280"

Accept-Ranges: bytes

Content-Length: 2336

Content-Type: text/html; charset=UTF-8

32

HTTP Proxy Server

HTTP

Server Browser Proxy

17

33

HTTPS

HTTPS

SSL

TCP

34

Client Browser Web Server

TCP

Connect

SSL

Connect

HTTPS GET

transaction

HTTPS Transaction

18

35

Typical HTTP use

• A Web page is set of

many items.

• Each item is

downloaded separately.

• Items from the same

server are downloaded

sequentially.

36

Domain Sharding

Use of multiple domains to increase the amount of

simultaneously downloaded resources for a particular website

19

37

Domain Sharding

Pros

• Several resources are downloaded in parallel

• Faster page load time

Cons

• Increased DNS lookup times

• Website modifications

• Increased TCP overhead

38

SPDY • Google, 2009-2015

• Multiplexed Stream Support

SPDY can send many sessions concurrently over a single TCP

connection without serializing requests. Make SPDY as efficient as HTTP

but only use a single connection.

• Request Prioritization

A client can request as many items as it wants from the server.

The server return the contents with the higher-priority first. .

• HTTP Header Compression

HTTP headers are compressed, leading to fewer bytes transmitted.

• Server Initiated Streams (aka "Server Push")

SPDY allows either the client or server to initiate a stream once the client

has established a connection.

• Server Hint

The server often knows a client will need a resource. It can inform the

client about resource it would otherwise discover much later.

20

39

HTTP/2

• First major update since HTTP/1.1

• Binary, instead of textual.

• Fully multiplexed – Allows sending multiple requests in parallel over a single TCP connection.

• Uses header compression HPACK to reduce overhead.

• Allows servers to PUSH responses to clients.

• Uses the new ALPN extension which allows for faster

encrypted connections

• Domain sharding and asset concatenation are no longer needed.

40

Binary Framing

© 2013 Ilya Grigorik. Published by O'Reilly Media, Inc.

21

41

Connection

42

Frame Header

22

43

Frame Types

• DATA transports HTTP message bodies

• HEADERS transports header fields for a stream

• PRIORITY communicates sender-advised priority of a stream

• RST_STREAM signals termination of a stream

• SETTINGS communicates configuration parameters for the connection

• PUSH_PROMISE signals a promise to serve the referenced resource

• PING used to check the roundtrip time and the "live" state

• GOAWAY orders the peer to stop creating streams

• WINDOW_UPDATE used to implement flow stream and connection flow control

• CONTINUATION used to continue a sequence of header block fragments

44

HPACK

23

45

Push

46

HTTP/2 Upgrade

A client supporting HTTP/1.1 and HTTP/2, wants to make a

request without prior knowledge about HTTP/2 support on

the server.

=> The client must use the HTTP Upgrade mechanism:

• starts an HTTP/1.1 request.

• includes an Upgrade header field with the "h2c" token.

• includes one HTTP2-Settings header field.

=> The Server can

• accept upgrade and produce an HTTP/2 reply.

• Ignore de upgrade header and produce a HTTP/1.1 reply.

[RFC7230]

24

47

HTTP/2 Upgrade

GET /page HTTP/1.1

Host: server.example.com

Connection: Upgrade, HTTP2-Settings

Upgrade: h2c

HTTP2-Settings: (SETTINGS payload)

HTTP/1.1 200 OK

Content-length: 243

Content-type: text/html

(... HTTP/1.1 response ...)

(or)

HTTP/1.1 101 Switching Protocols

Connection: Upgrade

Upgrade: h2c

(... HTTP/2 response ...)

Initial HTTP/1.1 request with

HTTP/2 upgrade header

Base64 URL encoding of

HTTP/2 SETTINGS payload

Server declines upgrade,

returns response via HTTP/1.1

Server accepts HTTP/2 upgrade,

switches to new framing

48

HEADERS frame in Wireshark

25

50

Apache

• Very well known.

• Respected HTTP server.

• Used commercially.

• Freely available from http://www.apache.org

• Plenty of plugins.

• Relatively easy and flexible to configure.

• Fast and Reliable.

• Supports HTTP/2

51

Multi-thread server

• Most servers follow a – Forking model

– Threaded model • needs special OS support

• uses less resources

• Apache is built as an hybrid multi-process multi-

threaded server.

– Keeps multiple child processes available.

– Each child process runs many threads.

– Each thread processes a request.

26

52

Apache Forking Model

MUX

Child

Child

Child

Child

HTTP

request

Allocate

Idle Child Get data from disk

Response

53

Forking Configuration

Most servers use default values…

Parameter Initial Value

StartServers 8

MinSpareServers 5

MaxSpareServers 20

MaxClients 150

MaxRequestsPerChild 1000

Most important options:

27

54

Important Files

• /etc/init.d/httpd – the server control script

• /etc/httpd/conf/http.conf – the main config file.

• /var/log/httpd/access_log

• /var/log/httpd/error_log

The main configuration file is only reread on a

server reload or restart

55

Reload or Restart

Restart shuts down then starts the server…

• If configuration file contains errors, start up can fail.

With a Reload,

• Apache checks the configuration file

– if it contains no errors, it is used.

– If it has errors, Apache keeps running the old configuration.

• Allows to reconfigure a server with no downtime.

Error log can be checked for help

• /var/log/httpd/error_log

• /var/log/messages (syslog)

28

56

Virtual Hosts

• The sharing of a single IP to provide multiple

hostnames is well supported in Apache.

• A Virtual Host is defined in the config file in a

<VirtualHost> block.

• Each block holds a list of hostnames it can handle

• The first host found in the file is always considered

the default, so if no VirtualHost section matches,

the first block is used.

57

VirtualHost config

<VirtualHost>

ServerAdmin prof@tele.isep.ipp.pt

DocumentRoot /home/tele/public_html

ServerName tele.isep.ipp.pt

ServerAlias www.tele.org tele.isep.pt

ErrorLog logs/tele-error_log

CustomLog logs/tele-access_log combined

</VirtualHost>

29

58

Personal Web pages

Typical environment:

• Apache runs on a server used by many users.

• Each user has his own directory in /home.

• Each user wants to build his own web pages.

Apache allows personal Web pages in the users home directory, under a dedicated subdirectory:

• public_html

• WWW

59

public_html access

• URLs of the form

– http://our.webserver.net/~JohnSmith/file.html

• Refer to

– /home/JohnSmith/public_html/file.html

• This feature can be activated in httpd.conf:

UserDir public_html

30

60

URL Rewriting

• mod_rewrite is a module in Apache.

• Allows changing URLs dynamically.

• Can be useful to:

– Change the URL of aliases in a domain so that they always give the correct name.

– Support directories and files being moved without breaking bookmarked URLs.

– Provide a variety of proxying methods.

61

Methods

• mod_rewrite has many functions: – RewriteCondition – an IF statement

– RewriteRule – an action (do it) statement.

– …

• Can be placed in several Apache configuration files:

– in VirtualHost areas of httpd.conf.

– In .htaccess at specific directories

– …

• To work, the area must also have:

RewriteEngine on

31

62

RewriteRule

Basic format:

RewriteRule URL-reg-exp New-URL

Example:

If /old.txt was moved to /new.txt

RewriteRule /old.txt /new.txt

63

Regular Expressions

• Text comparison uses regular expressions.

• Text matching:

. Any single Character

[chars] One of the characters in chars

[^chars] None of the characters in chars

Text1|Text2 Either “Text1” or “Text2”

^ Beginning of the URL

$ End of the URL

\ Escaping

32

64

Quantifiers and Grouping

Quantifiers:

? 0 or 1 of the preceding text

* 0 or more of the preceding text

+ 1 or more of the preceding text

n n occurrences of the preceding text

Grouping

(text) Marks a text group:

- Can limit an alternative.

- Can be back referenced as $n

65

Back References

$n refers to the nth group from the URL match.

Example:

– rewrite any URL ending in .txt to .html:

RewriteRule (.*)\.txt $1.html

33

66

More complex example

Rewrite URLs in all directories …/demo/ to use

directories /exp/ in the same position

RewriteRule ^(.*)/demo/(.*)$ $1/exp/$2

67

Additional Flags

• At the end of the line, the RewriteRule

can can have serveral Flags.

• Flags are listed in [brackets],

eg [F,G] for flags F and G.

• These change or enhance the

behaviour of the match.

34

68

Options:

• R or R=code – Sends the browser the new URL as an external

REDIRECTION. The code can be the type of redirection, such as 302 or 404.

• F

– Send back FORBIDDEN.

• G

– Send back GONE

• P

– Proxy: Forward the request

69

Options Cont…

• L

– Last: do not look at any more rules.

• C

– Chain: If the pattern matche,s do the next rule,

otherwise ignore the remaining rules.

• NC

– case insensitive.

• There are many more options….

35

70

Complex example

• If the URL has /work/ in it,

rewrite /work/ to /home/.

• In addition, if the URL did have /work/ in

it, replace “hello.txt” with “bye.txt”.

RewriteRule ^(.*)/work/(.*)$ $1/home/$2 [C]

RewriteRule ^(.*)hello.txt$ $1/bye.txt [L]

71

RewriteCond

• This command performs tests or RULES.

• If the test matches, then the next test is

checked.

• If all tests match, then the RewriteRule

which follows the tests is performed.

• If any Cond does not match, processing

skips on till after the Rule(s) in this block.

36

72

RewriteCond

Basic Form:

RewriteCond TestString ConditionString

• Compares the value of TestString to the

ConditionString.

• ConditionString can be a regular expression.

• TestString can include variables and file tests.

73

Variables:

• Some variables are available:

• REMOTE_ADDR

• REMOTE_HOST

• HTTP_HOST

• REQUEST_URI ( /index.html )

• REQUEST_FILENAME ( /home/mike/www/… )

• …

• Vars can be used as %{REMOTE_ADDR}

37

74

Flags

• RewriteCond can take 2 flags

– NC – case insensitive

– OR – or the Conds together.

• Normally all rules have to be true before

the Rule is done.

• With OR the rule is done if ANY Cond is

true.

75

Example 1

If 10.20.0.5 tries to view

/electro/index.html

redirect the page reference to

/electro/bye.html.

RewriteCond %{REMOTE_ADDR} ^10\.20\.0\.5$

RewriteRule ^/electro/index.html$ /electro/bye.html [L]

38

76

Example 2

Rewrite:

• isep.org,

• www.isep.org,

• www.isep.org.pt.

to isep.org.

RewriteEngine on

RewriteCond %{HTTP_HOST} !^isep\.org$

RewriteRule ^(.*)$ http://isep.org$1 [L,R]

77

Example 3

Rewrite *.isep.org to isep.org,

and *.isep.org.pt to isep.org.pt.

RewriteEngine on

RewriteCond %{HTTP_HOST} ^.+isep.org$

RewriteRule ^(.*)$ http://isep.org$1 [L,R]

RewriteCond %{HTTP_HOST} ^.+isep\.org\.pt$

RewriteRule ^(.*)$ http://isep.org.pt$1 [L,R]

39

78

Documentation: • RFC 1945 (HTTP 1.0)

• RFC 2616 (HTTP 1.1)

• Apache HTTP Server Version 2.4 Documentation

top related