1 dr alexiei dingli introduction to web science web 1.0

65
1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

Post on 19-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

1

Dr Alexiei Dingli

Introduction to Web Science

Web 1.0

Page 2: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

2

• Packet switching network

• IP Addressing

• Internet Applications

• The WWW and markup

• Searching the WWW

• Intelligent Agents

• Internet Governance

Introducing Web 1.0

Page 3: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

3

• Local area network (LAN)

– Network of computers located close together

• Wide area networks (WANs)

– Networks of computers connected over greater distances

• Circuit

– Combination of telephone lines and closed switches that connect them to each other

Packet-Switched Networks (1)

Page 4: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

4

• Circuit switching is used in telephone communication

• The Internet uses packet switching

• Packet switching needs computers called ‘routers’ and the programs called ‘routing algorithms’

Packet-Switched Networks (2)

Page 5: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

5

Packet-Switched Networks (3)

• Information is divided into packets

• It is passed from node to node

• It is recomposed as one chunk on the destination server

Page 6: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

6

Page 7: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

7

• Routing computers– Computers that decide how best to forward

packets

• Routing algorithms– Rules contained in programs on router computers

that determine the best path on which to send packets

– Programs apply their routing algorithms to information they have stored in routing tables

Routing Packets

Page 8: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

8

• Communications protocol suite

– Packet switched protocol• No end-to-end connection is required• Each message broken down into small pieces called packets• Packets possibly routed to destination over different paths

– Transmission Control Protocol (TCP)• Breaks messages into packets• Numbers packets in order• Reorders packets at the destination

– Internet Protocol (IP)• Routes packets to the proper destination

TCP/IP

Page 9: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

9

Open Systems Interconnections Model

OSI Model (also called TCP/IP protocol suite) layers (from the highest to the lowest):

7 Application

{ HTTP, SMTP, FTP, Telnet, SSH, Whois, etc.

6 Presentation

5 Session

4 Transport TCP, UDP

3 Network IP

2 Data Link Ethernet

1 Physical Wire, Radio, Fibre Optic

Page 10: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

10

• Internet addresses are based on a 32-bit number called an IP address

• IP addresses appear as a series of up to four separate numbers delineated by a period

• An address such as 126.204.89.56 uniquely identifies a computer connected to the Internet

• IP Subnetting conceptually divides a large network into smaller sub-networks

IP Address

Page 11: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

11

IP Classes (1)

Page 12: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

12

IP Classes (2)

Class Leading Value

Network Numbers

Addresses Per Network

Class A     0     126     16,777,214

Class B     10     16,384     65,534

Class C     110     2,097,152     254

Page 13: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

13

Subnetting

Page 14: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

14

• Explosion in size of IP routing tables.

• Every time more address space was needed, the administrator would have to apply for a new block of addresses.

• Any changes to the internal structure of a company's network would potentially affect devices and sites outside the organization.

• Keeping track of all those different Class C networks would be a bit of a headache in its own right.

Without subnetting …

Page 15: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

15

• Better Match to Physical Network Structure

• Flexibility

• Invisibility To Public Internet

• No Need To Request New IP Addresses

• No Routing Table Entry Proliferation

Benefits of Subnetting

Alexiei Dingli
Better Match to Physical Network Structure: Hosts can be grouped into subnets that reflect the way they are actually structured in the organization's physical network. Flexibility: The number of subnets and number of hosts per subnet can be customized for each organization. Each can decide on its own subnet structure and change it as required. Invisibility To Public Internet: Subnetting was implemented so that the internal division of a network into subnets is visible only within the organization; to the rest of the Internet the organization is still just one big, flat, “network”. This also means that any changes made to the internal structure are not visible outside the organization. No Need To Request New IP Addresses: Organizations don't have to constantly requisition more IP addresses, as they would in the workaround of using multiple small Class C blocks. No Routing Table Entry Proliferation: Since the subnet structure exists only within the organization, routers outside that organization know nothing about it. The organization still maintains a single (or perhaps a few) routing table entries for all of its devices. Only routers inside the organization need to worry about routing between subnets.
Page 16: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

16

• Network Layer• Developed in 1994

• Will replace the IP Vr4 standard– limits on network addresses will eventually lead to

exhaustion of available addresses (by 2023)– supports only 4,294,967,296 addresses (32bits)

• Improvements include– providing future cell phones and mobile devices their own

unique & permanent addresses– supports about 3.4 × 1038 (128bits)

IP Vr6 (or IP Next Generation)

Page 17: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

17

• A Uniform Resource Locator (URL) consists of names and abbreviations that are much easier to remember than IP addresses

• The HTTP protocol defines how an Internet resource is accessed

• An address such as www.microsoft.com is called a domain name

• Domain Name System (DNS)– A database of Internet names– DNS Servers convert Internet names to IP addresses– Top level domains

Domain Names

Page 18: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

18

Top-Level Domain Names

• Internet Corporation for Assigned Names and Numbers (ICANN)

– Responsible for managing domain names and coordinating them with IP address registrars

Page 19: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

19

• The web was not an ‘open’ place

• One company available where you could buy a .com, .net or .org domain

• Price of 100 dollars and a two year minimum

• Back then, there was a big chance you would be able to buy a dictionary word as .com

• In 2000, they lost the monopoly position and domain prices dropped over 95%

• Since then innovation halted and Network Solutions became one of the thousands anonymous domain registrars

Domain Name case study

Page 20: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

20

• E-Mail

• File transfers

• Instant messaging (IM)

• Newsgroups

• Streaming audio and video

• Internet telephony

• World Wide Web (WWW)

Internet Applications

Page 21: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

21

• Most popular and widely used Internet application

• 30 billion e-mails sent every day– Spam – junk e-mail messages– Spam costs corporate America $9 billion per year

• Every e-mail message contains head that describes source and destination for the message

• E-mail messages are text, but may have attachments of many types of digital data– Viruses often transmitted via e-mail

E-Mail

Page 22: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

22

• E-mail is sent across the Internet is managed and stored by mail servers

• Simple Mail Transfer Protocol (SMTP) is the standard to send mails to the server

• Post Office Protocol (POP) is the standard to get mails from the server

• The Interactive Mail Access Protocol (IMAP) is a newer e-mail protocol

SMTP, POP, and IMAP (1)

Page 23: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

23

SMTP, POP, and IMAP (2)

Page 24: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

24

• Use complex email addresses rather than name and surname combination– Why? Bots? Name Directories?

• Control exposure of email address– How? Java script? JPEG?

• Use multiple email addresses for different purposes– In what occasions?

• Use content-filtering software– black list spam filter – white list spam filter – challenge response using graphical challenges ?

Controlling Spam

Page 25: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

25

• Hotmail (1995)

• First place to get a free email address, disconnected from an ISP

• 4 years later, 30 million people worldwide were exchanging @hotmail email addresses

• Bought by Microsoft in 1998 for just 400 million dollars

• 2007 the end of Hotmail– transformation to “Live” mail to become an

integrated part of the Microsoft’s “Live” family

E-Mail Case Study

Page 26: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

26

• File transfer protocol (FTP)– Protocol providing for transmission of a file between

an Internet server and a user’s computer

• Peer-to-peer (P2P) file sharing– Share data from one computer to another– Every user can be a server– Napster

• Kazaa• Gnutella• Torrent

– With P2P, every user on the network can make data available to every other user on the network

File Transfers

Page 27: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

27

• Allows user to create a private chat session with another user

• IM started with AOL

• IM sneaking into corporate networks

• Many Web-based companies use IM technology for customer service– eBay

Instant Messaging

Page 28: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

28

Page 29: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

29

• ICQ abbreviation of “I seek you”

• 1996 first easy to use instant messenger program where you could add friends to your list, and see if they were online

• Back then it was revolutionary for the masses and it became the ‘application’ everybody had installed

• Acquired by AOL in June 1998 for a whopping $287 million  

• Eventually the program got too many additional features that made the application heavy and unorganized

• Competition of AOL IM, Yahoo IM, and MSN Messenger increased, and friends on your ICQ-list left the application eventually resulting in a mass abandoning of the network

ICQ case study

Page 30: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

30

• Online, bulletin board discussion forums

• Users post and read messages

• More than 100,000 newsgroups

• Millions of newsgroup readers

• Important information resource, especially for technical issues and products

• Newsgroup messages distributed using open standard – Many are uncensored

Usenet Newsgroups

Page 31: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

31

• Creating and sending audio and video files

– Sports• Basketball at sports.yahoo.com• Major league baseball

– News• Fox News• CNN radio

– Business• ZDNet

– Education• Warriors of the Net

Streaming Audio and Video

Page 32: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

32

• Voice-over Internet Protocol (VoIP)

• Use your computer like a telephone

• Software connects computers via the Internet and transmits voice data

• Savings comes from eliminating toll charges between locations

Internet Telephony

Page 33: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

33

Internet TV

Page 34: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

34

• Collection of hyperlinked computer files on the Internet

• Client-server application– Web servers– Web browsers as clients

• WWW standards– Hypertext markup language (HTML)

• Current standard for writing Web pages• Tags in HTML instruct the client browser how to format and display the

Web page content

– Hypertext transfer protocol (HTTP)• Establishes a connection between Web server and client

– Extensible markup language (XML)• A meta-markup language• Gives meaning to the data enclosed within XML tags

The World Wide Web

Page 35: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

35

• Create your own free homepage on the web

• 1997 Fifth most popular website, with over 500,000 homepages created

• Yahoo bought Geocities two years later for $3.57 billion dollars and started to actively commercialize the homepages with various advertising types that resulted in their death sentence

• ‘Real’ web hosting becoming affordable for anybody, the need for free homepages in this form vanished

Website case study

Page 36: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

36

• SGML is a rich meta language that is useful for defining markup languages

• HTML is particularly useful for displaying Web pages

• XML defines data structures for electronic commerce (and much more …)

Overview of Markup Languages

Page 37: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

37

Development of Markup Languages

http://www.w3.org/

Page 38: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

38

• The ISO adopted SGML standard in 1986

• SGML is nonproprietary and platform-independent

• SGML supports user-defined tags and architecture to complement the required richness of documents

Standard Generalized Markup Language

Page 39: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

39

• XML is a descendant of SGML

• XML allows designers to easily describe and deliver structured data from any application in a standard, consistent way

• XML can be embedded within an HTML document

• XML allows you to create your own customized markup language.

Extensible Markup Language

Page 40: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

40

• Tag – a piece of Markup– An opening tag <name>– A closing tag </name>

• Element – well formed usage of tags– <name>Alexiei</name>

• Attribute – properties– <name length=“7”>Alexiei</name>

• Rules to keep XML well formed1. Can be nested but not overlapping 2. Case sensitivity3. Quoted attributes4. Required end tag

• Short hand– <abc></abc> is equivalent to <abc/>

Learn XML in a slide

Page 41: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

41

<book>E-Commerce</booK>

<book pages=100>E-Commerce</book>

<book pages=“100”><title>E-Commerce</book></title>

<book pages=“100”><title>E-Commerce</title></book>

<book pages=“100”><title>E-Commerce</title><author>

<name>Gary</name><surname>Schneider</surname>

</author></book>

Some XML examples

Page 42: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

42

<book>E-Commerce</booK>

<book pages=100>E-Commerce</book>

<book pages=“100”><title>E-Commerce</book></title>

<book pages=“100”><title>E-Commerce</title></book>

<book pages=“100”><title>E-Commerce</title><author>

<name>Gary</name><surname>Schneider</surname>

</author></book>

Some XML examples

Page 43: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

43

Processing a Request for an XML Page

• Why going through all this hassle?• How would you go about displaying HTML on a

– PC– Handheld – Mobile

Page 44: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

44

• Tim Berners-Lee invented HTML

• HTML is a document production language that includes a set of tags that define the format and style of a document

• HTML is based on SGML

• HTML is an instance of one particular SGML document type – Document Type Definition (DTD)

Hypertext Markup Language

Page 45: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

45

• An HTML document contains both document content and tags

• The tags are the HTML codes inserted in a document to specify the format on screen

• Each tag is enclosed in brackets (< >)

• Most tags are two-sided – opening and closing tags

• Well formed tags, bots, meta tags?? Why are they important?

HTML Tags

Page 46: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

46

• Hyperlinks are bits of text that connect the current document to:– Another location in the same document– Another document on the same host machine– Another document on the Internet– Can they link to a toaster at home?

• Hyperlinks are created using the HTML anchor tag

• Two popular link structures:– Linear hyperlink structure– Hierarchical hyperlink structure

HTML Links

Page 47: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

47

• HTML version 1.0 was introduced in 1991

• HTML 2.0 was released in Sept. 1995

• HTML 3.2 was introduced in 1997

• HTML 4.0 was released by W3C in Dec 1997

• HTML 4.01 was released in Dec 1999

• XHTML 1.0 became a W3C recommendation in Jan 2000

HTML Version History

Page 48: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

48

• Low end editor displays HTML code on the screen and allow you to insert HTML tag pairs by clicking selected buttons

• High end editor are Web site builder programs, they provide a rich environment that displays the Web page, not the HTML code

• Microsoft FrontPage and Macromedia Dreamweaver are examples of Web site builders

HTML Editors (1)

Page 49: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

49

HTML Editors (2)

Page 50: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

50

• HTML and XML only display and exchange data• No interactivity; no processing of data

• Scripting languages– Provides basic interactivity

• Rollovers• Crawling text

– JavaScript– VBScript

• Full-featured Web programming– Java– Client side scripting or browser side scripting– Applets– J2EE

• Common Gateway Interface (CGI)– Allows passing of data between a static HTML page and a

computer program

Static versus Dynamic Pages

Page 51: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

51

• Most data on the Internet is part of the WWW

• Search engines – large databases that index WWW content

• Building the search engine database– Submit a site to the search engine administrator for listing

– Spiders• Metatags

– Google– Yahoo

Searching the WWW

Page 52: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

52

• A search engine is a special kind of Web page software that finds other Web pages that match a word or phrase you entered

• A Web directory is a listing of hyperlinks to Web pages that is organized into hierarchical categories Eg: http://directory.google.com/

• Search engines contain three major parts: spider, index, and utility

Search Engines

Page 53: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

53

Popular Search Engines

Page 54: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

54

Spiders and Crawlers

Page 55: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

55

Indexing

Page 56: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

56

• Search engine AltaVista was the Google of the last millennium

• First real effort to index the World Wide Web

• One of the few search engines that actually came up with good search results

• Had a hard time fighting spam listings in their results

• While spam grew logarithmic in Altavista, some company named Google found a way to prioritize web pages more intelligently, and thus keep spam out better

Search Engine case study

Page 57: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

57

• PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value

• Google interprets a link from page A to page B as a vote, by page A, for page B

• But Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote

• Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."

Case Study: ’s PageRank

Page 58: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

58

• An intelligent agent is a program that performs functions such as – information gathering, – information filtering, – mediation running, – in the background on behalf of a person or

entity

• What agents can you think of?

Intelligent Agents

Page 59: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

59

• Search Agents– Improve your information retrieval on the Internet – Used to find pages on the Web easily and quickly

• Meta Agents, Specialised (MP3), etc

• Web Agents– Improve browsing experience

• Automate form filling, off-line browsing, etc

• Monitoring Agents– Monitor web sites or specific themes – Used to get automatic alerts about the latest news

Intelligent Agents (2)

Page 60: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

60

• Virtual Assistants– Artificial life– Characters, plants, animals or people living on your desktop

• Shop Bots– Allow users to compare prices on the Internet– Find the best price for books, CDs, movies, etc.

• Webmastering Agents– Make it easy to manage a Web site and make it more effective– Monitor broken links, content gathering etc.

Intelligent Agents (3)

Page 61: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

61

• Other agents …

– Development agents• Used to develop other agents

– Games agents• Used in games

Intelligent Agents (4)

Page 62: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

62

Ms Dewey not your ordinary search agent!

Page 63: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

63

• Internet Engineering Task Force (IETF)– Works in groups to develop standards

• Internet Engineering Steering Group (IESG)– Approves or disapproves standards developed by the

IETF

• Internet Architecture Board (IAB)– The oversight authority for the standards development

process

• World Wide Web Consortium (W3C)– Promotes the WWW and develops new web technologies

and standards

Internet Governance

Page 64: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

64

• We’re all very familiar with Web 1.0

• But what makes Web 2.0?

• Next lecture …

Conclusion

Page 65: 1 Dr Alexiei Dingli Introduction to Web Science Web 1.0

65

Questions?