miw chapter 2
TRANSCRIPT
-
7/28/2019 MIW Chapter 2
1/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
Basic WWW Technologies
2.1 Web Documents.
2.2 Resource Identifiers: URI, URL, and URN.
2.3 Protocols.
2.4 Log Files.
2.5 Search Engines.
-
7/28/2019 MIW Chapter 2
2/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
2
What Is the World Wide Web?
The world wide web (web) is a network ofinformation resources. The web relies on threemechanisms to make these resources readilyavailable to the widest possible audience:
1. A uniform naming scheme for locating resourceson the web (e.g., URIs).
2. Protocols, for access to named resources over
the web (e.g., HTTP).3. Hypertext, for easy navigation among resources
(e.g., HTML).
-
7/28/2019 MIW Chapter 2
3/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
3
Internet vs. Web
Internet:
Internet is a more general term
Includes physical aspect of underlying networks
and mechanisms such as email, FTP, HTTPWeb:
Associated with information stored on theInternet
Refers to a broader class of networks, i.e. Webof English Literature
Both Internet and web are networks
-
7/28/2019 MIW Chapter 2
4/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
4
Essential Components of WWW
Resources:
Conceptual mappings to concrete or abstract entities, which do not
change in the short term
ex: ICS website (web pages and other kinds of files)
Resource identifiers (hyperlinks): Strings of characters represent generalized addresses that may
contain instructions for accessing the identified resource
http://www.ics.uci.edu is used to identify the ICS homepage
Transfer protocols: Conventions that regulate the communication between a browser
(web user agent) and a server
http://www.ics.uci.edu/http://www.ics.uci.edu/ -
7/28/2019 MIW Chapter 2
5/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
5
Standard Generalized Markup
Language (SGML)
Based on GML (generalized markup language),
developed by IBM in the 1960s
An international standard (ISO 8879:1986)
defines how descriptive markup should beembedded in a document
Gave birth to the extensible markup language
(XML), W3C recommendation in 1998
-
7/28/2019 MIW Chapter 2
6/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
6
SGML Components
SGML documents have three parts: Declaration: specifies which characters and delimiters
may appear in the application
DTD/ style sheet: defines the syntax of markupconstructs
Document instance: actual text (with the tag) of the
documents
More info could be found:http://www.W3.Org/markup/SGML
http://www.w3.org/markup/SGMLhttp://www.w3.org/markup/SGML -
7/28/2019 MIW Chapter 2
7/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
7
DTD Example One
ELEMENT is a keyword that introduces a new
element type unordered list (UL)
The two hyphens indicate that both the start tag
and the end tag for this element
type are required
Any text between the two tags is treated as a listitem (LI)
-
7/28/2019 MIW Chapter 2
8/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
8
DTD Example Two
The element type being declared is IMG
The hyphen and the following "O" indicatethat the end tag can be omitted
Together with the content model
"EMPTY", this is strengthened to the rulethat the end tag must be omitted. (no
closing tag)
-
7/28/2019 MIW Chapter 2
9/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
9
HTML Background
HTML was originally developed by Tim Berners-
Lee while at CERN, and popularized by the
Mosaic browser developed at NCSA.
The Web depends on Web page authors andvendors sharing the same conventions for
HTML. This has motivated joint work on
specifications for HTML.
HTML standards are organized by W3C :
http://www.w3.org/MarkUp/
http://www.w3.org/MarkUp/http://www.w3.org/MarkUp/ -
7/28/2019 MIW Chapter 2
10/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
10
HTML Functionalities
HTML gives authors the means to:
Publish online documents with headings, text, tables,
lists, photos, etc
Include spread-sheets, video clips, sound clips, and otherapplications directly in their documents
Link information via hypertext links, at the click of a
button
Design forms for conducting transactions with remote
services, for use in searching for information, making
reservations, ordering products, etc
-
7/28/2019 MIW Chapter 2
11/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
11
HTML Versions
HTML 4.01 is a revision of the HTML 4.0 Recommendation first
released on 18th December 1997.
HTML 4.01 Specification:
http://www.w3.org/TR/1999/REC-html401-19991224/html40.txt
HTML 4.0 was first released as a W3C Recommendation on 18December 1997
HTML 3.2 was W3C's first Recommendation for HTML which
represented the consensus on HTML features for 1996
HTML 2.0 (RFC 1866) was developed by the IETF's HTML
Working Group, which set the standard for core HTMLfeatures based upon current practice in 1994.
http://www.w3.org/TR/1999/REC-html401-19991224/html40.txthttp://www.rfc-editor.org/rfc/rfc1866.txthttp://www.rfc-editor.org/rfc/rfc1866.txthttp://www.w3.org/TR/1999/REC-html401-19991224/html40.txthttp://www.w3.org/TR/1999/REC-html401-19991224/html40.txthttp://www.w3.org/TR/1999/REC-html401-19991224/html40.txthttp://www.w3.org/TR/1999/REC-html401-19991224/html40.txthttp://www.w3.org/TR/1999/REC-html401-19991224/html40.txt -
7/28/2019 MIW Chapter 2
12/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
12
Sample Webpage
-
7/28/2019 MIW Chapter 2
13/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
13
Sample Webpage HTML
Structure
The title of the webpage
Body of the webpage
-
7/28/2019 MIW Chapter 2
14/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
14
HTML Structure
An HTML document is divided into a head section
(here, between and ) and a body
(here, between and )
The title of the document appears in the head (alongwith other information about the document)
The content of the document appears in the body. The
body in this example contains just one paragraph,
marked up with
-
7/28/2019 MIW Chapter 2
15/32
-
7/28/2019 MIW Chapter 2
16/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
16
Resource Identifiers
URI: Uniform Resource Identifiers
URL: Uniform Resource Locators
URN: Uniform Resource Names
-
7/28/2019 MIW Chapter 2
17/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
17
Introduction to URIs
Every resource available on the Web has anaddress that may be encoded by a URI
URIs typically consist of three pieces:
The naming scheme of the mechanism usedto access the resource. (HTTP, FTP)
The name of the machine hosting the
resource The name of the resource itself, given as a
path
-
7/28/2019 MIW Chapter 2
18/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
18
URI Example
http://www.w3.org/TR
There is a document available via the HTTP
protocol
Residing on the machines hosting
www.w3.org
Accessible via the path "/TR"
http://www.w3.org/TRhttp://www.w3.org/http://www.w3.org/http://www.w3.org/TR -
7/28/2019 MIW Chapter 2
19/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
19
Protocols
Describe how messages are encoded and
exchanged
Different Layering Architectures
ISO OSI 7-Layer Architecture
TCP/IP 4-Layer Architecture
-
7/28/2019 MIW Chapter 2
20/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
20
ISO OSI Layering Architecture
-
7/28/2019 MIW Chapter 2
21/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
21
ISOs Design Principles
A layer should be created where a different levelof abstraction is needed
Each layer should perform a well-defined
function The layer boundaries should be chosen tominimize information flow across the interfaces
The number of layers should be large enough
that distinct functions need not be throwntogether in the same layer, and small enoughthat the architecture does not become unwieldy
-
7/28/2019 MIW Chapter 2
22/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
22
TCP/IP Layering Architecture
-
7/28/2019 MIW Chapter 2
23/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
23
TCP/IP Layering Architecture
A simplified model, provides the end-to-
end reliable connection
The network layer
Hosts drop packages into this layer, layer
routes towards destination
Only promise Try my best
The transport layer
Reliable byte-oriented stream
-
7/28/2019 MIW Chapter 2
24/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
24
Hypertext Transfer Protocol (HTTP)
A connection-oriented protocol (TCP) used
to carry WWW traffic between a browser
and a server
One of the transport layer protocol
supported by Internet
HTTP communication is established via a
TCP connection and server port 80
-
7/28/2019 MIW Chapter 2
25/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
25
GET Method in HTTP
-
7/28/2019 MIW Chapter 2
26/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
26
Domain Name System
DNS (domain name service): mapping fromdomain names to IP address
IPv4:
IPv4 was initially deployed January 1st
. 1983 andis still the most commonly used version.
32 bit address, a string of 4 decimal numbersseparated by dot, range from 0.0.0.0 to
255.255.255.255.IPv6:
Revision of IPv4 with 128 bit address
-
7/28/2019 MIW Chapter 2
27/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
27
Top Level Domains (TLD)
Top level domain names, .com, .edu, .gov and ISO
3166 country codes
There are three types of top-level domains:
Generic domains were created for use by the Internetpublic
Country code domains were created to be used by
individual country
The .arpa domain Address and Routing ParameterAreadomain is designated to be used exclusively for Internet-
infrastructure purposes
http://www.iana.org/gtld/gtld.htmhttp://www.iana.org/cctldhttp://www.iana.org/arpa-dom/http://www.iana.org/arpa-dom/http://www.iana.org/cctldhttp://www.iana.org/gtld/gtld.htm -
7/28/2019 MIW Chapter 2
28/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
28
Registrars
Domain names ending with .aero, .biz,
.com, .coop, .info, .museum, .name, .net,
.org, or .pro can be registered through
many different companies (known as"registrars") that compete with one another
InterNIC at http://internic.net
Registrars Directory:
http://www.internic.net/regist.html
http://internic.net/http://www.internic.net/regist.htmlhttp://www.internic.net/regist.htmlhttp://internic.net/ -
7/28/2019 MIW Chapter 2
29/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
29
Server Log Files
Server Transfer Log: transactions between a
browser and server are logged
IP address, the time of the request
Method of the request (GET, HEAD, POST) Status code, a response from the server
Size in byte of the transaction
Referrer Log: where the request originated
Agent Log: browser software making the request (spider)
Error Log: request resulted in errors (404)
-
7/28/2019 MIW Chapter 2
30/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
30
Server Log Analysis
Most and least visited web pages
Entry and exit pages
Referrals from other sites or searchengines
What are the searched keywords
How many clicks/page views a pagereceived
Error reports, like broken links
-
7/28/2019 MIW Chapter 2
31/32
Modeling the Internet and the WebSchool of Information and Computer ScienceUniversity of California, Irvine
31
Server Log Analysis
-
7/28/2019 MIW Chapter 2
32/32
Modeling the Internet and the WebSchool of Information and Computer Science 32
Search Engines
According to Pew Internet Project Report
(2002), search engines are the most popular
way to locate information online
About 33 million U.S. Internet users query on
search engines on a typical day.
More than 80% have used search engines
Search Engines are measured by coverage and
recency
http://www.pewinternet.org/reports/pdfs/PIP_Search_Engine_Data.pdfhttp://www.pewinternet.org/reports/pdfs/PIP_Search_Engine_Data.pdf