web publishing architecture

56
Web Publishing Architecture Look at the various components of Web publishing, many of which are common to most Web applications. HTML Document Publishing CGI Scripting Applications Content Management Systems

Upload: janae

Post on 06-Jan-2016

33 views

Category:

Documents


1 download

DESCRIPTION

Web Publishing Architecture. Look at the various components of Web publishing, many of which are common to most Web applications. HTML Document Publishing CGI Scripting Applications Content Management Systems. The Web Browser is…. A program available everywhere. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Web Publishing Architecture

Web Publishing Architecture Look at the various components of

Web publishing, many of which are common to most Web applications. HTML Document Publishing CGI Scripting Applications Content Management Systems

Page 2: Web Publishing Architecture

The Web Browser is… A program available everywhere. A generalized information interface. A client that connects to distributed

servers. A single point of control over the

Web fought over by Microsoft and Netscape.

Page 3: Web Publishing Architecture

The Web, Circa 1993

Page 4: Web Publishing Architecture

Key Challenges Were on the Client How to present information in a Web

browser.

Page 5: Web Publishing Architecture

Developed by Pei Wei in 1992, Viola was an application toolkit, built on top of the X Window System. Its www browser was a sample application, integrating styled text and graphics.

Page 6: Web Publishing Architecture

In this example, the Viola browser embedded another application and its controls.

Page 7: Web Publishing Architecture

World Wide Web Wizards Workshop (July 1993) Early attempt to forge

common development agenda.

Tension between slow-moving standards development vs. seat-of-the-pants innovation

Page 8: Web Publishing Architecture

HTML Hypertext Markup Language

A simple SGML vocabulary or tagset Control content and layout of

presentation. Human readable data format.

Page 9: Web Publishing Architecture

The Web, Circa 1995 Publication Models

Page 10: Web Publishing Architecture

Key Challenges Were on the Server

• Publishing Becomes a Server-side Application• Apache, mod_perl and Perl.• Didn’t Much Depend On Client-Side

Capabilities

• Development of Custom Content Management Systems• Manage the publishing process

Page 11: Web Publishing Architecture

The Web Server…

Page 12: Web Publishing Architecture

HyperText Transfer Protocol (HTTP) HTTP is a Request/Response Protocol "HTTP is a protocol with the lightness

and speed necessary for a distributed collaborative hypermedia information system. " Tim Berners-Lee, 1992, Basic HTTP

Achieves a loose coupling of client and servers. References: HTTP 1.1 Spec

Page 13: Web Publishing Architecture

Anatomy of a Request

Browser locates server (oreilly.com) and makes a connection to port number 80 (in a typical configuration) on that machine.

Page 14: Web Publishing Architecture

Full RequestGET /index.html HTTP/1.1 Host: localhost Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/xbm, */*

Accept-Language: en Connection: Keep-Alive User-Agent: Mozilla/4.0 (compatible; MSIE 4.5; Mac_PowerPC)

Page 15: Web Publishing Architecture

Server Returns status of request.

Sends header info followed by a blank line. Content-type: text/html Content-length: 3896

Sends document or data from a CGI program.

Objects embedded in document such as images generate new requests to the server.

HTTP/1.1 200 OK

Page 16: Web Publishing Architecture

The Apache Web Server The Apache Group, an Open Source

software project, has developed the leading Web server with over 50% of all servers.

Web servers are fairly stable technology. Reference: Apache.org, Netcraft survey Apache: The Definitive Guide

Page 17: Web Publishing Architecture

Apache Directories Have you set up a Web server?

/usr/local/apache is unix/linux installation directory

/htdocs is directory for HTML files. /cgi-bin is for scripts. /conf is configuration directory where

file httpd.conf lives.

Page 18: Web Publishing Architecture

Configuring a Web Server Site administrator usually takes

care of the following server configuration issues by editing httpd.conf: Document and content type mapping Authentication and Access Control Logging Virtual Servers

Page 19: Web Publishing Architecture

URL Management Decision about URLs:

Relative vs. Absolute links on the site. Permanent addressing vs. current

addressing /98/09/21/document.html today.html

What are you going to do when things change? • URLs can be brittle.

Page 20: Web Publishing Architecture

Authentication Authentication is

asking a user to provide identification, usually a user name and password.

Basic Authentication uses the htaccess file. More sophisticated applications will manage this information in a user database.

Apache section

Page 21: Web Publishing Architecture

Logs

Found in logs directory: access.logLog entry tells you:

IP number – Date/Time – Request

152.163.201.137 - - [20/Sep/2001:02:10:08 -0700] "GET / HTTP/1.0" 200 8087

Page 22: Web Publishing Architecture

Logs Processing Some of the tasks surrounding logs: Log rotation (Day, week, month) Log compression (files grow large) Log file parsing and reporting Reverse DNS lookup

References: Lincoln Stein, Yahoo's list of tools, Marketwave's Hitlist Examples

Page 23: Web Publishing Architecture

Server Hardware and OS Server farms or hosting services are set

up to manage the hardware, the OS and the network for 24/7 operation.

Properly configured PC's can be powerful enough to handle sizable load, obviating the need for more expensive servers from Sun.

Small dedicated Web server devices such as the Cobalt server with embedded Linux and Web administration.

Page 24: Web Publishing Architecture

Web Publishing HTML Authoring Systems Server Side Includes CGI Applications Templates

Page 25: Web Publishing Architecture

Authoring Systems Debate over whether to show or hide

HTML to authors. Page Creation Tools

HTML Editors• Homesite; BBEdit.

Web Site Authoring Systems• FrontPage; GoLive; NetObjects; Dreamweaver

Market share estimate of authoring tools. (Security Space)

Page 26: Web Publishing Architecture

Server Side Includes Insert dynamic information such as date

or time. Include file shared by a set of documents.

One way to create a consistent page layout across the site.

Example: Use server-side include to put common information for a page header or footer in a separate file and source it from all documents.

Page 27: Web Publishing Architecture

CGI ApplicationsCommon Gateway Interface

A web server passes control to an application, which generates a dynamic HTML document and returns it to the server.

Forms-based Input and Interaction Session management Transactions

Page 28: Web Publishing Architecture

Scripting Perl became the favored scripting

language for Web applications. CGI modules in Perl and Python provide

a higher-level interface for the programmer and hide the low level details. Script installed in server's cgi-bin directory. HTML document containing form

references the CGI script.

Page 29: Web Publishing Architecture

Sample Perl CGI script

Page 30: Web Publishing Architecture

Stateless Transactions HTTP is a stateless protocol. Each

interaction is independent of the others.

Maintaining state or session tracking is necessary for a number of applications such as shopping carts.

Page 31: Web Publishing Architecture

Application Servers

Page 32: Web Publishing Architecture

OpenSource

Sun Microsoft IBM Macromedia

OS Linux Solaris Windows Linux WindowsWeb Server Apache Apache IIIS Apache ApacheApplicationServer

PHP JSP ASP Websphere Cold Fusion

DB MySQL Oracle SQLServer DB2 SQLServer

Web Application Stack

Page 33: Web Publishing Architecture

Characteristics Embed programming code inside of

HTML documents. Languages like PHP, Cold Fusion and

ASP can be viewed as extensions to HTML.

One consideration is whether there’s clean separation between code and documents.

Page 34: Web Publishing Architecture

Cold Fusion Cold Fusion from

Allaire/Macromedia is a Windows/NT/2000 application.

Server is configured so that files ending in .cfm are passed to the Cold Fusion application server.

Page 35: Web Publishing Architecture

Cold Fusion and HTML file

<H2>New Form</H2><FORM ACTION="searchquery.cfm" METHOD="Post">

Last Name: <Input Type="text" Name="LastName">

<Input Type="Submit" Value="Search">

</FORM>

Page 36: Web Publishing Architecture

Application file (.cfm)<CFQUERY Name="EmployeeList" Datasource="Examples">

Select * From EmployeesWHERE LastName = '#LastName#'</CFQUERY><body><H2>Results</H2><CFOUTPUT><P>The search for #Form.LastName# returned the following:</CFOUTPUT><CFOUTPUT QUERY="EmployeeList"><HR>#FirstName# #LastName# (Phone: #PhoneNumber#) <BR></CFOUTPUT>

Page 37: Web Publishing Architecture

Database Servers Flat-file database, dbm files Free

MySQL and Postgres Mid-range

MS Access and SQL Server Commercial High-end

Oracle 8i, Sybase, IBM’s DB2

Page 38: Web Publishing Architecture

Database Woes Generating pages dynamically can

impact a site’s performance and administration. Many applications find ways of

generating static pages and caching them

Should documents be stored in the database?

Page 39: Web Publishing Architecture

Databases The standard application interfaces

to the database are through SQL and/or ODBC.

SQL can be used to create or modify data records in the database as well as to select sets of data from it.

Page 40: Web Publishing Architecture

SQL Example: SELECT NAME, ADDR FROM EMPLOYEES WHERE NAME EQ "DALE DOUGHERTY" Languages such as Perl, Python and Java all

provide fairly standard interfaces for accessing databases.

Earlier Cold Fusion example simply embeds SQL statement in an HTML document. The CF application passes the query to the database server, which processes the request and returns the data to the application, which passes it back to the web server.

Page 41: Web Publishing Architecture

Application Server Issues What degree of technical expertise

is required to build applications? How portable is the application? How

much does it tie you to one OS or Web server or language?

Is the server API proprietary or standardized?

Page 42: Web Publishing Architecture

Application Service Provider (ASP) A Web site is increasingly put

together as a set of components that could be software or services sourced from different sites.

ASPs are providers of services rather than software. Take away the burden of owning and maintaining software.

Page 43: Web Publishing Architecture

Content Management A specialized application server A system for managing the

production, development and delivery of content by a team of producers.

Page 44: Web Publishing Architecture

CMS Features Manages "metadata" to build collections of

documents and create different views. Generates content from database Provides for staging of content; replication. Administrative interface to manage scheduling

and workflow Manage interactions with customers and keep

track of vital information. Allow for distribution of information in multiple

formats.

Page 45: Web Publishing Architecture

Implementing Layouts in CMS Which Layout Strategy Will You Use?

Server Side Includes (SSI) Style sheets (CSS)

• Table layout vs block positioning

Templates XSLT (transformation of XML into

HTML)

Page 46: Web Publishing Architecture

CS (Community Server) Content Management System

written using Apache, Perl, MySQL Used for O’Reilly Network,

XML.com and Perl.com. Demo

Page 47: Web Publishing Architecture

Other CMS Vignette

Expensive, commercial CMS system Ars Digita

Java-based platform. Zope

Python-based

Page 48: Web Publishing Architecture

Advantages of CMS An cost-effective way to manage

information and users. A consistent administrative

interface for building and managing complex Web sites.

A robust development platform that provides common publishing functionality and allows customization.

Page 49: Web Publishing Architecture

Other Major Components Advertising Server Search Engine Conferencing System

Page 50: Web Publishing Architecture

Ad Server Software or Service?

The ad server provides for the dynamic rotation of advertising banners on a site, and the collection of data to track impressions and click-throughs.

Ad traffic adminstrator sets up campaigns to run on the server.

Advertisers use the server to get real-time reporting on how ad is doing.

Page 51: Web Publishing Architecture

Search Engine Search engine provides a full-text

index of a site or a collection of sites.

Webmaster needs to configure indexer to run at certain intervals, either to regenerate complete index or simply to update it.

References: Atomz

Page 52: Web Publishing Architecture

Conferencing and Chat Systems Sites use conferencing and chat

systems to create community and increase user involvement. Conferencing or Bulletin Board Systems Chat Instant Messaging Polls and Surveys

Page 53: Web Publishing Architecture

Mailing List Software Email remains the dominant form of

communication on the Web. The ability to capture email addresses and send regular email to users is very valuable.

Major Domo, ListServ, Lyris

Page 54: Web Publishing Architecture

Flow Weblogs

Commentary; Directing Attention to Interesting Items on the Web

Personal Writing Space Tools

• Manila from Userland• Others such as Blogger

RSS Rich Site Summary Headlines Enhance to send more metadata

Page 55: Web Publishing Architecture

Example: Meerkat An Open Wire Service

An RSS aggregator A guide to technical information

produced by RSS channels. Information is sorted by channel and

technology. Can be customized and personalized.

Page 56: Web Publishing Architecture

Summary Publishing is a server-side application.

Most functionality is controlled by the application server.

Content management systems provide a standard set of capabilities but most CMS applications require a high degree of customization.

Software choices are often dictated by hardware and OS selection, although they don’t need to be.