cs/info 330 enterprise architectures · • if the server fails, nobody can work – load created...
TRANSCRIPT
1
CS/INFO 330Enterprise Architectures
Mirek [email protected]
(Some of the slides are courtesy of Gustavo Alonso, Fabio Casati, Harumi Kuno, Vijay
Machiraju and Ethan Cerami)
CS/INFO 330 2
The Big PictureWWW Site
Visitor
THE WEB
Public Web Server
BusinessTransaction
Server
MainMemory
Cache
DBMS
DataWarehouseApplication
Server
INTRANET,VPN
Internal User
InternalWeb Server
CS/INFO 330 3
Overview
• Enterprise architectures• Internet concepts
– URIs– HTTP Protocol
• The presentation tier– HTML– HTML forms– JavaScript, style sheets– Cookies
2
CS/INFO 330 4
Layers and Tiers• Client
– Any user or program that wants to perform an operation over the system
– Interacts with the system through presentation layer
• Application logic– Determines what the system actually
does– Enforces business rules and
establishes business processes– Can take many forms: programs,
constraints, business processes…• Resource manager
– Deals with organization (storage, indexing, and retrieval) of the data necessary to support the application logic
– Typically a database, but can also be a text retrieval system or other data management system providing querying capabilities and persistence
Client
Application Logic
Resource Manager
Presentation layer
Business rules
Business objects
Client
Server
Database
Client
Business processes
Persistent storage
CS/INFO 330 5
A Game of Boxes and Arrows• Box = system part• Arrow = connection between two system
parts• More boxes => more modular system
– More opportunities for distribution and parallelism
– Allows encapsulation, component based design, reuse
• More boxes => more arrows– More sessions (connections) to be maintained– More coordination necessary– System more complex to monitor and manage
• More boxes => more context switches and intermediate steps to go through before one gets to the data
– Performance suffers considerably
System designers try to balance the flexibility of modular design with the performancedemands of real applications. Once a layer is established, it tends to migrate down and merge with lower layers.
There is no problem in system design that cannot be solved by
adding a level of indirection. There is no performance
problem that cannot be solved by removing a level of
indirection.
CS/INFO 330 6
Top-Down Design
top-down designPL-A PL-B
PL-C
AL-AAL-B
AL-D
AL-C
RM-1 RM-2
top-down architecture
RM-1 RM-2
AL-A AL-D
AL-C AL-B
PL-APL-B
PL-C
-Start with high-level goals of problem-Proceed to define everything required to achieve these goals
-Emphasizes final system goals-Can be tailored to address functional (supported operations) and non-functional (performance, availability) issues-But: difficult to do when integrating legacy systems
3
CS/INFO 330 7
Top-Down Design
presentation layer
resource management layer
application logic layer
client
info
r mat
ion
sys t
em
1. define access channelsand client platforms
2. define presentation formats and protocols forthe selected clients andprotocols
3. define the functionalitynecessary to deliver thecontents and formats neededat the presentation layer
4. define the data sourcesand data organization neededto implement the applicationlogic
top-down design
CS/INFO 330 8
Bottom-Up Design• Many of the basic components
already exist and cannot be easily replaced
– Stand alone systems which need to be integrated into new systems
• Components do not necessarily cease to work as stand alone components
– Often old applications continue running at the same time as new applications
• Approach has a wide application• Much of the work and products in
this area are related to middleware
– Intermediate layer– Provides a common interface– Bridges heterogeneity– Copes with distributionLegacy systems
Newapplication
Legacy application
CS/INFO 330 9
Bottom-Up Designbottom-up design
PL-A PL-BPL-C
AL-AAL-B
AL-D
AL-C
bott
om-u
p ar
chitec
ture
AL-A AL-D
AL-C AL-B
PL-APL-B
PL-C
wrapper wrapper wrapperwrapper wrapperwrapper
legacyapplication
legacyapplication
legacysystem
legacysystem
legacysystem
4
CS/INFO 330 10
Bottom-Up Design
presentation layer
resource management layer
application logic layer
client
info
r mat
ion
sys t
em
1. define access channelsand client platforms
2. examine existing resourcesand the functionalitythey offer
3. wrap existing resourcesand integrate their functionalityinto a consistent interface
4. adapt the output of the application logic so that itcan be used with the requiredaccess channels and clientprotocols
bottom-up design
CS/INFO 330 11
One Tier: Fully Centralized• Presentation layer, application
logic and resource manager built as a monolithic entity
• Access through dumb terminals
• Was the typical architecture of mainframes, offering several advantages:– No forced context switches in
control flow (everything happens within the system)
– All is centralized, managing and controlling resources is easier
– Design can be highly optimized by blurring the separation between layers
Server
CS/INFO 330 12
Two Tier: Client/Server • As computers became more powerful,
it was possible to move the presentation layer to the client.
• Several advantages– Frees up resources for application logic– Can tailor presentation layer without
increasing system complexity• Independent development and
maintenance– Introduces concept of API (Application
Program Interface)• Interface to invoke the system from the
outside• Allows designers to think about
federating the systems into a single system
• Resource manager only sees one client: the application logic
– No context switches or calls between components of lower two layers
Server
5
CS/INFO 330 13
APIs in Client/Server• Introduced notion of a service• Introduced notion of an interface
– How client can invoke a given service• Many standardization efforts due to need for common APIs
resource management layer
serv
e r
serviceinterface
serviceinterface
serviceinterface
serviceinterface
server’s API
serviceserviceserviceservice
CS/INFO 330 14
Technical Aspects Of Two Tier• Advantages compared to Single Tier
– Take advantage of client capacity to off-load work to clients– Work within the server takes place within one scope (almost as in 1 tier)– Server design still tightly coupled and can be optimized by ignoring
presentation issues– Still relatively easy to manage and control from a software engineering
point of view• Weaknesses
– Connection management to clients– Clients are “tied” to the system (no standard presentation layer)– Connecting to two systems, a client needs two presentation layers– No failure or load encapsulation
• If the server fails, nobody can work– Load created by one client will directly affect work of others
• All compete for the same resources
CS/INFO 330 15
The Main Limitation of Client/Server
• Accessing multiple servers:– Underlying systems do not know
about each other– No common business logic– Client is the point of integration
(increasingly fat clients)– Responsibility of dealing with
heterogeneous systems shifted to client
– Client becomes responsible for knowing where things are, how to get to them, and how to ensure consistency
• Very inefficient– Software design, portability, code
reuse, performance (since client capacity is limited)
• These issues cannot be solved with 2-tier
Server A Server B
6
CS/INFO 330 16
Three Tier: Middleware• Three layers fully
separated– Better for application
integration, flexibility, and portability of application logic
• The layers are also typically distributed, taking advantage of the complete modularity of the design
CS/INFO 330 17
Middleware• Middleware is just a level of
indirection between clients and other system layers
• Introduces additional layer of business logic encompassing all underlying systems
• By doing this, a middleware system:
– simplifies the design of the clients by reducing the number of interfaces,
– provides transparent access to the underlying systems,
– acts as the platform for inter-system functionality and high level application logic, and
– takes care of locating resources, accessing them, and gathering results
Middleware or global application logic
Clients
Local resource managers
Local application logic
Server A Server B
middleware
CS/INFO 330 18
Technical Aspects of Middleware
• Benefits of middleware layer– Reduces number of necessary interfaces
• Clients see only one system (the middleware)• Local applications see only one system (the middleware)
– Centralizes control– Makes necessary functionality widely available to all clients– Allows to implement functionality that otherwise would be very
difficult to provide– First step towards dealing with application heterogeneity (some
forms of it)• Middleware layer weaknesses
– Another indirection level– Complex software
7
CS/INFO 330 19
External clients
connecting logic
control
user logic
internal clients
2 tie
r sys
tem
s
Resource managers
wrappers
middleware
Resource manager
2 tier system
mid
dlew
are
syst
em
External client
Three-Tier Middleware-Based System
CS/INFO 330 20
N-Tier Architectures• Appear in two settings
– Connecting several three tier systems to each other or have 1-, 2-, and 3-tier systems in the resource management layer
– Adding connectivity through the Internet
• Web server treated as additional tier (more complex than most presentation layers)
• Addition of Web layer led to the notion of “application servers”– Middleware platforms
supporting access through the Web
client
resource management layer
application logic layer
information system
middleware
presentationlayer
Web server
Web browser
HTML filter
CS/INFO 330 21
INTERNET
FIREWALL
LAN
Webserver cluster
LAN,gateways
LAN
internalclients
LAN
middlewareapplication
logic
resource management
layer databaseserver
LAN
middlewareapplication
logic
additional resource management layers
LAN
Wrappersand
gateways
fileserver
application
N-tier In RealityProblems:•High complexity•Too muchmiddleware
•Redundantfunctionality
8
CS/INFO 330 22
Blocking or Synchronous Interaction
• Traditionally, information systems use blocking calls– Synchronous interaction– Both parties have to be
“on-line”• Caller makes request• Receiver gets request,
processes it and sends response
• Caller receives response• Caller must wait until
response comes back
Disadvantages due to synchronization:– Connection overhead– Higher probability of
failures– Difficult to identify and
react to failures– It is not really practical for
complex interactions
CallReceive
ResponseAnswer
idle time
client server
CS/INFO 330 23
Overhead of Synchronism• Need to maintain a session
between caller and receiver– Expensive– Limit on how many sessions
can be active at the same time• For this reason, client/server
systems often resort to connection pooling to optimize resource utilization– Have a pool of open
connections– Allocate connections as
needed
• Synchronous interaction requires a context for each call and a context management system for all incoming calls
request()
do with answer
receiveprocessreturn
sessionduration
request()
do with answer
receiveprocessreturn
Context is lostNeeds to be restarted!!
CS/INFO 330 24
Failures In Synchronous Calls• If client or server fail, the context
is lost– If the failure occurred before 1,
nothing has happened– If the failure occurs after 1 but
before 2 (receiver crashes), then the request is lost
– If the failure happens after 2 but before 3, side effects may cause inconsistencies
– If the failure occurs after 3 but before 4, the response is lost but the action has been performed (try again?)
• Who is responsible for finding out what happened?
• Finding out when the failure took place not easy
– Chain of invocations—failure can occur anywhere along the chain
request()
do with answer
receiveprocessreturn
12
34
request()
do with answertimeout
try again
do with answer
receiveprocessreturn
12
3
receiveprocessreturn
2’
3’
9
CS/INFO 330 25
Two SolutionsENHANCED SUPPORT
• Client/Server systems and middleware platforms provide a number of mechanisms to deal with the problems created by synchronous interaction– Transactional interaction– Service replication and
load balancing
ASYNCHRONOUS INTERACTION
• Example: email• Caller sends message• Message gets stored
somewhere until receiver reads it and sends response
• Response is sent in a similar manner
• Asynchronous interaction can take place in two forms:– Non-blocking invocation– Persistent queues
CS/INFO 330 26
Message Queuing• Reliable queuing is an
excellent complement to synchronous interactions– Modular design
• Code for making a request can be in a different module (even a different machine) than code for dealing with the response
– Easier to design sophisticated distribution modes
– Helps to handle communication sessions in more abstract way
– More natural way to implement complex interactions between heterogeneous systems
do with answerdo with answer
request()request()
receiveprocessreturn
queue
queue
CS/INFO 330 27
Overview
• Enterprise architectures• Internet concepts
– URIs– The HTTP Protocol
• The presentation tier– HTML– HTML forms– JavaScript, style sheets– Cookies
10
CS/INFO 330 28
Internet Concepts
• URIs• The HTTP Protocol
– HTTP Overview– Example HTTP Session– HTTP 1.0 v. 1.1– Live Demo via HTTP Tracer Plus– Structure of Client Requests/Server
Responses
CS/INFO 330 29
Uniform Resource Identifiers• Uniform naming schema to identify resources on
the Internet• A resource can be anything:
– Index.html– mysong.mp3– picture.jpg
• Example URIs:http://www.cs.wisc.edu/~dbbook/index.htmlmailto:[email protected]
CS/INFO 330 30
Structure of URIs
http://www.cs.wisc.edu/~dbbook/index.html
• URI has three parts:– Naming schema (http)– Name of the host computer (www.cs.wisc.edu)– Name of the resource (~dbbook/index.html)
• URLs (Uniform Resource Locators) are a subset of URIs
11
CS/INFO 330 31
HTTP Overview
• HTTP: HyperText Transfer Protocol• Developed by Tim Berners Lee, 1990• Client/Server Architecture:
– Client requests a document• Example clients: Firefox, IE, Safari
– Server returns the document• Example servers: Apache, IIS (MSFT’s Internet
Information Server)
CS/INFO 330 32
Watch HTTP• Telnet:
– telnet www.yahoo.com 80– GET /– Hit enter twice
• See your requests:– http://www.schroepl.net/cgi-bin/http_trace.pl
• Many products for tracing HTTP traffic– Search for http tracer or similar
CS/INFO 330 33
Example HTTP Session• Client sends request, Server sends response
• Client requests the following URL: http://www.cs.cornell.edu:80/
• Anatomy of the Request:– http:// HyperText Transfer Protocol
• Other options: ftp, mailto– www.cs.cornell.edu : host name– :80: Port Number
• 80 is reserved for HTTP• Ports can range from 1 to 65,535
– / Root document
12
CS/INFO 330 34
The Client RequestActual Browser Request
GET / HTTP/1.1Accept: image/gif, image/x-xbitmap, image/
jpeg, image/pjpeg, */*Accept-Language: en-usAccept-Encoding: gzip, deflateUser-Agent: Mozilla/4.0 (compatible; MSIE
5.01; Windows NT)Host: www.cs.cornell.eduConnection: Keep-Alive
CS/INFO 330 35
Anatomy of the Client Request• GET / HTTP/1.1
– Requests the root / document– Specifies HTTP version 1.1– HTTP Versions: 1.0 and 1.1 (more on this later)
• Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */*– Indicates what type of media the browser will accept
• Accept-Language: en-us– Browser’s preferred language
• Accept-Encoding: gzip, deflate– Accepts compressed data (faster download times)
CS/INFO 330 36
Anatomy of the Client Request• User-Agent: Mozilla/4.0 (compatible; MSIE 5.01;
Windows NT)– Indicates the browser type.
• Host: www.cs.cornell.edu– Required for HTTP 1.1– Optional for HTTP 1.0– A Server may host multiple hostnames. Hence, the
browser indicates the host name here.• Connection: Keep-Alive
– Enables “persistent connections”, better performance (more later)
13
CS/INFO 330 37
Server ResponseHTTP/1.1 200 OKDate: Mon, 24 Sept 2001 20:54:26 GMTServer: Apache/1.3.6 (Unix)Last-Modified: Mon, 24 Sept 2001 14:06:11 GMTContent-length: 327Connection: closeContent-type: text/html <title>Sample Homepage</title><img src="/images/oreilly_mast.gif"><h1>Welcome</h2>This is the webpage of ...
CS/INFO 330 38
Anatomy of Server Response• HTTP/1.1 200 OK
– Server Status Code– Code 200: Document was found– We will examine other status codes shortly
• Date: Mon, 24 Sept 2001 20:54:26 GMT– Date on the server – GMT (Greenwich Mean Time)
• Last-Modified: Mon, 24 Sept 2001 14:06:11 GMT– Time when document was last modified– Very useful for browser caching– If page in browser cache, may not need to request whole
document again (more later)
CS/INFO 330 39
Anatomy of Server Response• Content-length: 327
– Number of bytes in the document response• Connection: close
– Indicates that server will close connection– If client wants to send another request, it will need to open
another connection to the server• Content-type: text/html
– Indicates MIME Type of the return document• Multi-Purpose Internet Mail Extensions
– Enables web servers to return binary or text files– Other MIME Categories:
• Audio, video, images, xml
14
CS/INFO 330 40
Anatomy of Server Response
The actual HTML document:<title>Sample Homepage</title>
<img src="/images/oreilly_mast.gif">
<h1>Welcome</h2>This is the web page of ...
CS/INFO 330 41
HTTP 1.0 vs 1.1: Getting Objects
Once a browser receives an HTML page, it makes separate connections to retrieve different objects within the page.
Client Web Browser
Web Server
Give me /index.html
Here you go...
Now, give me logo.gif
Here you go...
CS/INFO 330 42
HTTP 1.0 vs 1.1
• HTTP 1.0– For each request, opens a new connection
with the server• HTTP 1.1
– For each request, default action is to maintain an open connection with the server
– Faster, persistent connections– Supported by most browsers and servers
15
CS/INFO 330 43
Example: HTTP 1.0 v. 1.1
• HTTP 1.0: Get HTML Page plus Images– Open Connection: GET /index.html– Open Connection: GET /logo.gif– Open Connection: GET /button.gif
• HTTP 1.1: Get HTML Page plus Images– Open Persistent Connection: GET /index.html– GET /logo.gif– GET /button.gif
CS/INFO 330 44
Client Requests
• Every client request includes three parts– Method: Indicates type of request, HTTP
version and name of requested document– Header Information: Used to specify browser
version, language, etc.– Entity Body: Used to specify form data for
POST requests
CS/INFO 330 45
Client Methods• GET and POST: We will see them later when we
discuss HTML forms• HEAD
– Similar to GET, except that the method requests only the header information
– Server will return date-modified, but will not return the data portion of the requested document
– Useful for browser caching– Example
• If browser contains a cached version of a page, it issues a head request
• If document has not been modified recently, use cached version
16
CS/INFO 330 46
Server Responses
• Every server response includes three parts– Response Line: HTTP version number, three
digit status code, and status message– Header: Information about the server and the
object being served– Entity Body: The actual data
CS/INFO 330 47
Server Status Codes
• 100-199 Informational• 200-299 Client Request Successful• 300-399 Client Request Redirected• 400-499 Client Request Incomplete• 500-599 Server Errors
CS/INFO 330 48
Some Important Status Codes
• 200: OK – Request was successful.
• 301: Moved Permanently– Server redirects client to a new URL
• 404 Not Found– Document does not exist
• 500 Internal Server Error– Error within the Web Server
17
CS/INFO 330 49
HTTP Is Stateless• What does this mean?
– No “sessions”– Every message is completely self-contained– No previous interaction is “remembered” by the protocol– Tradeoff between ease of implementation and ease of
application development• Other functionality has to be built on top
• Implications for applications– Any state information (shopping carts, user login-information)
needs to be encoded in every HTTP request and response (!)– Popular methods on how to maintain state:
• Cookies• Dynamically generate unique URL’s at the server level
CS/INFO 330 50
Overview
• Enterprise architectures• Internet concepts• The presentation tier
– HTML– HTML forms– JavaScript, style sheets– Cookies
CS/INFO 330 51
Web Data Formats
• HTML– The presentation language for the Internet
• XML– A self-describing, hierarchal data model
• We will cover XML and associated query and transformation languages (XPath, XSLT) later.
18
CS/INFO 330 52
HTML: An Example<h3>Fiction</h3><b>Waiting for the Mahatma</b><UL><LI>Author: R.K. Narayan</LI><LI>Published 1981</LI>
</UL><b>The English Teacher</b><UL><LI>Author: R.K. Narayan</LI><LI>Published 1980</LI><LI>Paperback</LI>
</UL>
</BODY></HTML>
<HTML><HEAD></HEAD><BODY><h1>Barns and Nobble Internet
Bookstore</h1>Our inventory:<h3>Science</h3><b>The Character of Physical
Law</b><UL>
<LI>Author: Richard Feynman</LI>
<LI>Published 1980</LI><LI>Hardcover</LI>
</UL>
CS/INFO 330 53
HTML: A Short Introduction
• HTML is a markup language• Commands are tags:
– Start tag and end tag– Examples
• <HTML> … </HTML>• <UL> … </UL>
• Many editors automatically generate HTML directly from a document (e.g., Microsoft Word has “Save as html”)
CS/INFO 330 54
HTML: Sample Commands
• <HTML>: HTML document • <UL>: unordered list• <LI>: list item• <h1>: largest heading• <h2>: second-level heading, <h3>, <h4>
analogous• <B>Title</B>: bold
19
CS/INFO 330 55
Overview
• Enterprise architectures• Internet concepts• The presentation tier
– HTML– HTML forms– JavaScript, style sheets– Cookies
CS/INFO 330 56
HTML Forms• Web form allows user to enter data through a
browser (presentation tier)• Completed form is submitted to the server
(middle tier)
Source: Wikipedia
CS/INFO 330 57
HTML Form Example<FORM method="post" action="bar.php"><TABLE border="1"><TR bgcolor="#CCCCFF"><TH>Name</TH><TH>Value</TH>
</TR><TR><TD>Name</TD><TD><input type="text" size="25">
</TD></TR><TR><TD>Sex</TD><TD><input type="radio" name="sex" value="male"> Male<BR><input type="radio" name="sex" value="female"
checked> Female</TD>
</TR><TR><TD>Eye color</TD><TD><select name="eye color"><option>blue</option><option>brown</option><option selected>green</option><option>other</option>
</select></TD>
</TR>
<TR><TD>Check all that apply</TD><TD><input type="checkbox" name="height" value="1">
Over 6 feet tall</input><BR><input type="checkbox" name="weight" value="1">
Over 200 pounds</input></TD>
</TR><TR><TD colspan="2">Describe your athletic ability:<BR><textarea name="athletic" cols="50"
rows="4"></textarea></TD>
</TR><TR><TD colspan="2" align="center"><input type="submit" value="Enter my information">
</TD></TR>
</TABLE></FORM>
20
CS/INFO 330 58
General Form Format• <FORM ACTION=“page.jsp” METHOD=“GET”
NAME=“LoginForm”> … </FORM>– Forms cannot be nested
• ACTION: URI of page to which form contents are submitted– Absence of ACTION => use current page– page.jsp provides logic for processing form
• METHOD: GET or POST– Method for submitting completed form to web server
• NAME: name of form (optional)
CS/INFO 330 59
HTML Form Elements• <INPUT TYPE=“text” NAME=“username”
VALUE=“Joe”>• TYPE: type of input field
– text: single line of text– checkbox– radio: radio button– submit: button to submit form to the server
• NAME: symbolic name for field• VALUE: default contents of text field or label of
buttons
CS/INFO 330 60
More Form Elements
• textarea– Like text, but multiple rows possible
• select– Drop-down list showing possible selections
<textarea name="athletic" cols="50“ rows="4"></textarea>
<select name="eye color"><option>blue</option><option>brown</option><option selected>green</option><option>other</option></select>
21
CS/INFO 330 61
Submitting Forms• <FORM ACTION=“page.jsp” METHOD=“GET”
NAME=“LoginForm”> … </FORM>• GET method
– Form contents assembled into query URI– Visible to user in browser, e.g.,
http://www.google.com/search?hl=en&q=cs+330 for Google search for CS 330
– Can bookmark that URI• POST method
– Form contents sent in separate data block inside HTTP message body
CS/INFO 330 62
GET Method• action?name1=value1&name2=value2
– http://www.google.com/search?hl=en&q=cs+330• action: URI specified in ACTION attribute of
FORM tag (or current document URI if no ACTION specified)– To process form at middle tier, ACTION attribute
should point to page, script, or program that will process the form values
• name=value: user input from INPUT fields• URI has to be single string with no spaces
– Convert special characters to hex character code– Convert space to +
CS/INFO 330 63
Overview
• Enterprise architectures• Internet concepts• The presentation tier
– HTML– HTML forms– JavaScript, style sheets– Cookies
22
CS/INFO 330 64
JavaScript• For adding programs to web pages that run at the client
(i.e., the machine running the web browser), e.g.:– Detect browser type to load browser-specific page– Simple consistency checks on form fields
• Does email address contain @?– Pop-ups
• Embedded in HTML document with SCRIPT tag– <SCRIPT LANGUAGE=“JavaScript” SRC=“validateForm.js”> …
</SCRIPT>– LANGUAGE: scripting language– SRC: file with script code that is automatically embedded into the
HTML document
CS/INFO 330 65
JavaScript<SCRIPT LANGUAGE=“JavaScript”>
<!--alert(“This is a Pop-up!”);
//--></SCRIPT>• JavaScript code can be placed inside comments
– Avoids displaying it by browsers that do not understand SCRIPT tag
• Lightweight programming language– Variables, usual operators, assignments, conditional statements
(if (condition) {statements;} else {statements;}), loops (for, do-while, while)
– Functions (function f(args) {statements;})
CS/INFO 330 66
Example: Check for Empty FieldsHTML Form:
<H1>Please enter login and password:</H1><form name=“LoginForm” method=“POST”action=“TableOfContents.jsp” onSubmit=“return testLoginEmpty()”><input type=“text” name=“userid”><input type=“password” name=“password”><input type=“submit” value=“Login” name=“submit”><input type=“reset” value=“Clear” name=“reset”>
</form>
Associated JavaScript:
<script language=“javascript”>function testLoginEmpty(){loginForm = document.LoginFormif ((loginForm.userid.value == “”) ||(loginForm.password.value == “”))
{alert(‘Please enter values for userid and password.’);return false;
}else return true;
}</script>
Implicitly defined; refers to current page
Form event handler; called whensubmit button is pressed or userpresses return in text field
23
CS/INFO 330 67
Style Sheets• Idea: Separate display from contents, adapt display to
different presentation formats• Two aspects
– Document transformation: what part of the document to display in what order
– Document rendering: how to display each part of the document• Why use style sheets?
– Reuse same document for different displays– Tailor display to user preferences– Reuse document in different contexts
• Stylesheet languages– Cascading Style Sheets (CSS) for HTML documents– Extensible Stylesheet Language (XSL) for XML documents
CS/INFO 330 68
Cascading Style Sheets
• Defines how to display HTML documents• Many HTML documents can refer to same CSS
– Can change format of entire web site by changing single style sheet
– Usage in HTML: <LINK REL=“style sheet”TYPE=“text/css” HREF=“books.css”/>
• Style sheet line format: selector {property: value}– Selector: tag whose format is defined– Property: tag’s attribute whose value is set– Value: value of attribute
CS/INFO 330 69
CSS Example
body {background-color: yellow}h1 {font-size: 36pt}h3 {color: blue}p {margin-left: 50px; color: red}
• First line has same effect as <body background-color=“yellow”>
24
CS/INFO 330 70
XSL
• Language for expressing style sheets• Three components
– XSLT: XSL Transformation Language• Can transform one document into another
– XPath: XML Path language– XSL Formatting Objects
• Formats output of an XSL transformation
• Will be covered in later lectures
CS/INFO 330 71
Overview
• Enterprise architectures• Internet concepts• The presentation tier
– HTML– HTML forms– JavaScript, style sheets– Cookies
CS/INFO 330 72
Sites That Know You...
• Examples:– www.weather.com– www.amazon.com
• Each time I return to these sites, they remember who I am…– Weather.com remembers previous locations,
preferences– Amazon.com remembers products I looked at and
makes recommendations• How do they do that?
25
CS/INFO 330 73
What is a Cookie?
• Small piece of data generated by a web server, stored on the client’s hard drive
• Serves as an add-on to the HTTP specification– Remember: HTTP by itself is stateless
• Controversial, as it enables web sites to track web users and their habits
CS/INFO 330 74
Example Cookie Use• Web Site Acme.com wants to
track number of unique visitorswho access its site
• HTTP Server logs shownumber of “hits”, but notnumber of unique visitors*
• Problem: HTTP is stateless– Retains no memory regarding individual users
• Cookies provide mechanism to solve this problem * Actually, you could check the log files for IP addresses, but
Internet proxies and NAT are a problem.
© Warner Bros.
CS/INFO 330 75
Tracking Unique Visitors• Step 1: Wile E. Coyote requests home page for
acme.com• Step 2: acme.com web server generates new
unique ID for him• Step 3: Server returns home page plus a cookie
set to the unique ID• Step 4: Each time Coyote returns to
acme.com, the browser automaticallysends the cookie along with the GETrequest
26
CS/INFO 330 76
Cookie Conversation
Browser ServerGive me the home page!
Here’s the home page plusa cookie.
Now, give me the news page(cookie is sent automatically)
I’ve seen you before… Here’sthe news page.
CS/INFO 330 77
Cookie Notes
• Created in 1994 for Netscape 1.1• Cookies cannot be larger than 4K• Limit on number of cookies per domain
(e.g., netscape.com, microsoft.com)• Cookies stay on your machine until:
– they automatically expire– they are explicitly deleted
• Cookies work the same on all browsers
CS/INFO 330 78
Magic Cookies
• The term cookie comes from an old programming hack, called Magic Cookies
• If a programmer needed to make two programs communicate, he would create a “magic cookie”, a small file containing data to transfer between program parts
27
CS/INFO 330 79
Cookie Standards
• Version 0 (Netscape)– The original cookie specification– Implemented by all browsers and servers– We will focus on this Version
• Version 1– Internet Official Protocol Standard RFC 2109– Compatible with V0, but with some extensions
CS/INFO 330 80
Why use Cookies?• Tracking unique visitors• Creating personalized web sites• Shopping Carts• Tracking users across a site
– E.g. do users who visit the sports news page also visit the sports store?
• Type javascript:alert("Cookies:"+document.cookie) in browser URL field to see active cookies for page
CS/INFO 330 81
Cookie Anatomy
• Version 0 specifies six cookie parts:– Name– Value– Domain– Path– Expires– Secure
28
CS/INFO 330 82
Cookie Parts: Name/Value
• Name– Name of your cookie (Required)– Cannot contain whitespaces, semicolons or
commas• Value
– Value of your cookie (Required)– Cannot contain whitespaces, semicolons or
commas
CS/INFO 330 83
Cookie Parts: Domain
• Only pages from the domain which created a cookie are allowed to read the cookie– Example: amazon.com cannot read yahoo.com’s
cookies (imagine the security flaws if this were otherwise)
• By default, domain is set to the full domain of the web server that served the web page– Example: myserver.mydomain.com would
automatically set the domain to .myserver.mydomain.com
CS/INFO 330 84
Cookie Parts: Domain• Domains are always prepended with a dot
– This is a security precaution: all domains must have at least two periods
• Can set a higher level domain– Example: myserver.mydomain.com can set domain to
.mydomain.com• Allows hisserver.mydomain.com and
herserver.mydomain.com to access the same cookies
• No matter what, you cannot set a domain other than your own
29
CS/INFO 330 85
Cookie Parts: Path
• Restricts cookie usage within the site• Default: path is set to the path of the page
that created the cookie– Example: user requests page from
mymall.com/storea– By default, cookie will only be returned to
pages for or under /storea• If path is /, cookie will be returned to all
pages (a common practice)
CS/INFO 330 86
Cookie Parts: Expires
• When the cookie will expire• Specified in Greenwich Mean Time (GMT):
– Wdy DD-Mon-YYYY HH:MM:SS GMT• If value is left blank, browser will delete the
cookie when the user exits the browser– Known as a session cookie, as opposed to a
persistent cookie
CS/INFO 330 87
Cookie Parts: Secure
• Secure flag is designed to encrypt cookies while in transit
• Secure cookie will only be sent over a secure connection (such as SSL)
• In other words, if a cookie is set to secure, and you connect using a non-secure connection, the cookie will not be sent
30
CS/INFO 330 88
Weaknesses of Cookies
• People share machines– Per-user cookie files solve this
• People use multiple machines– Single user has different cookies on different
machines—is this a bug or a feature?• Cookies can be erased from the client
machine’s hard drive (bug or feature…)• Cookies can be copied
– Security implications for eCommerce sites
CS/INFO 330 89
Cookie Abuse - I
• Conventional catalog stores could sell information about customers– Name, address, purchases
• eCommerce sites can gather and sell much more detailed information– All the way down to clickstreams
• But that’s only for a single site
CS/INFO 330 90
Cookie Abuse - II • Ad servers and the “1-pixel gif”
– Mediawiki.org’s page pXYZ contains• <img src=“x... lotofbanners.com/stat?page=...pXYZ”>
– lotofbanners.com sets a persistent UID cookie in the usual way– Gets around cookie domain specification
• So lotofbanners.com can maintain user page visit statistics across multiple sites
Image source: Wikipedia
31
CS/INFO 330 91
Cookie Blocking Software
• Cookie Central (and others) have pointers to lots of cookie blocking software– Cookie Pal– Cookie Crusher– Cookie Cruncher– Many more…
• But many (most?) sites don’t work properly if you disable cookies these days
CS/INFO 330 92
Some Cookie Alternatives
• Embedding information in URL– Typically in query string
• Hidden form fields– Similar to URL embedding
• HTTP authentication– Browser stores access credentials