web publishing architecture look at the various components of web publishing, many of which are...

56
Web Publishing Architecture Look at the various components of Web publishing, many of which are common to most Web applications. HTML Document Publishing CGI Scripting Applications Content Management Systems

Post on 19-Dec-2015

232 views

Category:

Documents


3 download

TRANSCRIPT

Web Publishing Architecture Look at the various components of

Web publishing, many of which are common to most Web applications. HTML Document Publishing CGI Scripting Applications Content Management Systems

The Web Browser is… A program available everywhere. A generalized information interface. A client that connects to distributed

servers. A single point of control over the

Web fought over by Microsoft and Netscape.

The Web, Circa 1993

Key Challenges Were on the Client How to present information in a Web

browser.

Developed by Pei Wei in 1992, Viola was an application toolkit, built on top of the X Window System. Its www browser was a sample application, integrating styled text and graphics.

In this example, the Viola browser embedded another application and its controls.

World Wide Web Wizards Workshop (July 1993) Early attempt to forge

common development agenda.

Tension between slow-moving standards development vs. seat-of-the-pants innovation

HTML Hypertext Markup Language

A simple SGML vocabulary or tagset Control content and layout of

presentation. Human readable data format.

The Web, Circa 1995 Publication Models

Key Challenges Were on the Server

• Publishing Becomes a Server-side Application• Apache, mod_perl and Perl.• Didn’t Much Depend On Client-Side

Capabilities

• Development of Custom Content Management Systems• Manage the publishing process

The Web Server…

HyperText Transfer Protocol (HTTP) HTTP is a Request/Response Protocol "HTTP is a protocol with the lightness

and speed necessary for a distributed collaborative hypermedia information system. " Tim Berners-Lee, 1992, Basic HTTP

Achieves a loose coupling of client and servers. References: HTTP 1.1 Spec

Anatomy of a Request

Browser locates server (oreilly.com) and makes a connection to port number 80 (in a typical configuration) on that machine.

Full RequestGET /index.html HTTP/1.1 Host: localhost Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/xbm, */*

Accept-Language: en Connection: Keep-Alive User-Agent: Mozilla/4.0 (compatible; MSIE 4.5; Mac_PowerPC)

Server Returns status of request.

Sends header info followed by a blank line. Content-type: text/html Content-length: 3896

Sends document or data from a CGI program.

Objects embedded in document such as images generate new requests to the server.

HTTP/1.1 200 OK

The Apache Web Server The Apache Group, an Open Source

software project, has developed the leading Web server with over 50% of all servers.

Web servers are fairly stable technology. Reference: Apache.org, Netcraft survey Apache: The Definitive Guide

Apache Directories Have you set up a Web server?

/usr/local/apache is unix/linux installation directory

/htdocs is directory for HTML files. /cgi-bin is for scripts. /conf is configuration directory where

file httpd.conf lives.

Configuring a Web Server Site administrator usually takes

care of the following server configuration issues by editing httpd.conf: Document and content type mapping Authentication and Access Control Logging Virtual Servers

URL Management Decision about URLs:

Relative vs. Absolute links on the site. Permanent addressing vs. current

addressing /98/09/21/document.html today.html

What are you going to do when things change? • URLs can be brittle.

Authentication Authentication is

asking a user to provide identification, usually a user name and password.

Basic Authentication uses the htaccess file. More sophisticated applications will manage this information in a user database.

Apache section

Logs

Found in logs directory: access.logLog entry tells you:

IP number – Date/Time – Request

152.163.201.137 - - [20/Sep/2001:02:10:08 -0700] "GET / HTTP/1.0" 200 8087

Logs Processing Some of the tasks surrounding logs: Log rotation (Day, week, month) Log compression (files grow large) Log file parsing and reporting Reverse DNS lookup

References: Lincoln Stein, Yahoo's list of tools, Marketwave's Hitlist Examples

Server Hardware and OS Server farms or hosting services are set

up to manage the hardware, the OS and the network for 24/7 operation.

Properly configured PC's can be powerful enough to handle sizable load, obviating the need for more expensive servers from Sun.

Small dedicated Web server devices such as the Cobalt server with embedded Linux and Web administration.

Web Publishing HTML Authoring Systems Server Side Includes CGI Applications Templates

Authoring Systems Debate over whether to show or hide

HTML to authors. Page Creation Tools

HTML Editors• Homesite; BBEdit.

Web Site Authoring Systems• FrontPage; GoLive; NetObjects; Dreamweaver

Market share estimate of authoring tools. (Security Space)

Server Side Includes Insert dynamic information such as date

or time. Include file shared by a set of documents.

One way to create a consistent page layout across the site.

Example: Use server-side include to put common information for a page header or footer in a separate file and source it from all documents.

CGI ApplicationsCommon Gateway Interface

A web server passes control to an application, which generates a dynamic HTML document and returns it to the server.

Forms-based Input and Interaction Session management Transactions

Scripting Perl became the favored scripting

language for Web applications. CGI modules in Perl and Python provide

a higher-level interface for the programmer and hide the low level details. Script installed in server's cgi-bin directory. HTML document containing form

references the CGI script.

Sample Perl CGI script

Stateless Transactions HTTP is a stateless protocol. Each

interaction is independent of the others.

Maintaining state or session tracking is necessary for a number of applications such as shopping carts.

Application Servers

OpenSource

Sun Microsoft IBM Macromedia

OS Linux Solaris Windows Linux WindowsWeb Server Apache Apache IIIS Apache ApacheApplicationServer

PHP JSP ASP Websphere Cold Fusion

DB MySQL Oracle SQLServer DB2 SQLServer

Web Application Stack

Characteristics Embed programming code inside of

HTML documents. Languages like PHP, Cold Fusion and

ASP can be viewed as extensions to HTML.

One consideration is whether there’s clean separation between code and documents.

Cold Fusion Cold Fusion from

Allaire/Macromedia is a Windows/NT/2000 application.

Server is configured so that files ending in .cfm are passed to the Cold Fusion application server.

Cold Fusion and HTML file

<H2>New Form</H2><FORM ACTION="searchquery.cfm" METHOD="Post">

Last Name: <Input Type="text" Name="LastName">

<Input Type="Submit" Value="Search">

</FORM>

Application file (.cfm)<CFQUERY Name="EmployeeList" Datasource="Examples">

Select * From EmployeesWHERE LastName = '#LastName#'</CFQUERY><body><H2>Results</H2><CFOUTPUT><P>The search for #Form.LastName# returned the following:</CFOUTPUT><CFOUTPUT QUERY="EmployeeList"><HR>#FirstName# #LastName# (Phone: #PhoneNumber#) <BR></CFOUTPUT>

Database Servers Flat-file database, dbm files Free

MySQL and Postgres Mid-range

MS Access and SQL Server Commercial High-end

Oracle 8i, Sybase, IBM’s DB2

Database Woes Generating pages dynamically can

impact a site’s performance and administration. Many applications find ways of

generating static pages and caching them

Should documents be stored in the database?

Databases The standard application interfaces

to the database are through SQL and/or ODBC.

SQL can be used to create or modify data records in the database as well as to select sets of data from it.

SQL Example: SELECT NAME, ADDR FROM EMPLOYEES WHERE NAME EQ "DALE DOUGHERTY" Languages such as Perl, Python and Java all

provide fairly standard interfaces for accessing databases.

Earlier Cold Fusion example simply embeds SQL statement in an HTML document. The CF application passes the query to the database server, which processes the request and returns the data to the application, which passes it back to the web server.

Application Server Issues What degree of technical expertise

is required to build applications? How portable is the application? How

much does it tie you to one OS or Web server or language?

Is the server API proprietary or standardized?

Application Service Provider (ASP) A Web site is increasingly put

together as a set of components that could be software or services sourced from different sites.

ASPs are providers of services rather than software. Take away the burden of owning and maintaining software.

Content Management A specialized application server A system for managing the

production, development and delivery of content by a team of producers.

CMS Features Manages "metadata" to build collections of

documents and create different views. Generates content from database Provides for staging of content; replication. Administrative interface to manage scheduling

and workflow Manage interactions with customers and keep

track of vital information. Allow for distribution of information in multiple

formats.

Implementing Layouts in CMS Which Layout Strategy Will You Use?

Server Side Includes (SSI) Style sheets (CSS)

• Table layout vs block positioning

Templates XSLT (transformation of XML into

HTML)

CS (Community Server) Content Management System

written using Apache, Perl, MySQL Used for O’Reilly Network,

XML.com and Perl.com. Demo

Other CMS Vignette

Expensive, commercial CMS system Ars Digita

Java-based platform. Zope

Python-based

Advantages of CMS An cost-effective way to manage

information and users. A consistent administrative

interface for building and managing complex Web sites.

A robust development platform that provides common publishing functionality and allows customization.

Other Major Components Advertising Server Search Engine Conferencing System

Ad Server Software or Service?

The ad server provides for the dynamic rotation of advertising banners on a site, and the collection of data to track impressions and click-throughs.

Ad traffic adminstrator sets up campaigns to run on the server.

Advertisers use the server to get real-time reporting on how ad is doing.

Search Engine Search engine provides a full-text

index of a site or a collection of sites.

Webmaster needs to configure indexer to run at certain intervals, either to regenerate complete index or simply to update it.

References: Atomz

Conferencing and Chat Systems Sites use conferencing and chat

systems to create community and increase user involvement. Conferencing or Bulletin Board Systems Chat Instant Messaging Polls and Surveys

Mailing List Software Email remains the dominant form of

communication on the Web. The ability to capture email addresses and send regular email to users is very valuable.

Major Domo, ListServ, Lyris

Flow Weblogs

Commentary; Directing Attention to Interesting Items on the Web

Personal Writing Space Tools

• Manila from Userland• Others such as Blogger

RSS Rich Site Summary Headlines Enhance to send more metadata

Example: Meerkat An Open Wire Service

An RSS aggregator A guide to technical information

produced by RSS channels. Information is sorted by channel and

technology. Can be customized and personalized.

Summary Publishing is a server-side application.

Most functionality is controlled by the application server.

Content management systems provide a standard set of capabilities but most CMS applications require a high degree of customization.

Software choices are often dictated by hardware and OS selection, although they don’t need to be.