cocoon an xml web publishing framework from the apache project roland schweitzer

42
Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

Upload: jemimah-paul

Post on 18-Jan-2018

228 views

Category:

Documents


0 download

DESCRIPTION

36 August 2002OAR Web Shop Cocoon An XML-based WWW publishing framework implemented as a Java Servlet. –Web site content stored in XML files (or RDBMS, LDAP Server or other source) is transformed (mostly via XSLT) into new XML files (to exclude certain info for example) and then serialized into human usable output (like an HTML or PDF file).

TRANSCRIPT

Page 1: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

Cocoon

An XML Web Publishing Framework From the Apache

Project

Roland Schweitzer

Page 2: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 2

Today’s Topics:• Definitions• Motivation• Required Tools (Java, Apache Tomcat and

Cocoon)• Basic Cocoon Operation

– Matchers, Generators, Transforms and Serializers. Oh My!

– sitemap.xml glues it all together.

Page 3: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 3

Cocoon• An XML-based WWW publishing

framework implemented as a Java Servlet.– Web site content stored in XML files (or

RDBMS, LDAP Server or other source) is transformed (mostly via XSLT) into new XML files (to exclude certain info for example) and then serialized into human usable output (like an HTML or PDF file).

Page 4: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 4

Reusable Content

Page 5: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 5

Motivation for using Cocoon• We distribute climate data• Users (including scientists) find data via

public search engines like google• Public search engines index HTML content• NOAA and other scientific organization use

special purpose search engines that use FGDC (or DIF derived from FGDC)

Page 6: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 6

Motivation continued• These facts add up to maintaining separate

“documents” for each purpose• XML and Cocoon offers a (yet another

potential) way out of the morass of many special purpose document collections

Page 7: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 7

Suppose info was stored as XML

<page><title>Reynolds Sea Surface Temperature </title><prefix>data.sst</prefix><abstract><para>The optimum interpolation (OI) SST analysis…<para></abstract><contact><name>CDC Data Management Personel</name><address1>325 Broadway</address1><phone>(303) 497-6244</phone>

<email>[email protected]</email></contact>

…</page>

Page 8: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 8

The Power of XML Content• Can be parsed with standard XML tools

– Can be easily used for another purpose besides the Web

– Can be written with powerful XML GUI tools (e.g. XML spy)

– (Might be) easier to maintain

Page 9: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 9

Reusable Content

Page 10: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 10

Schematic of the Solution Using Cocoon

R DBMSwith netCD F m etadata

Local standard HTML output

for hum ans

Hum an readableFG DC (output as H TML) with

nice form atting anchors and links

FG DC O utput fordom ain specific search

engines

XML D ocum ent

Cocoon Some other process

Page 11: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 11

Required Tools• On Solaris 7 and 8 I have used the binary

distributions of:– Java 1.4.0 (java.sun.com)– Tomcat 4.0.4 (www.apache.org)– Cocoon 2.0.3 (xml.apache.org)

• At this time, these are the latest releases.• Follow the installation instructions for each

package.

Page 12: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 12

Basic Operation• Cocoon is based on pipelines:

A Bit of SoftwareXML File New XML File

A Bit of Software

New XML FileA Bit of Software Info to client

(e.g HTML to browser)

Page 13: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 13

Basic Operation• Cocoon is based on pipelines. An XML document

is pushed through a pipeline consisting of one Generator (read a file, create a file from an LDAP server, etc.), zero or more Transforms (for example, to leave out sensitive information for external users) and ends with a Serializer that transforms the XML to binary or character data for consumption by the client (Web browser).

• The entire site could use only one pipeline.

Page 14: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 14

Basic Operation• If you need more than one pipeline…• Matchers (wildcard and regular expression)

and Selectors (Boolean expressions) can be used to control the pipeline used to process the XML content.

Page 15: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 15

Components• Matchers, Generators, Transforms and

Serializers are all Cocoon Components.• Pipelines are build out of Components.• Components are declared and pipelines are

constructed in the sitemap.xmap file.• The “Bit of Software” needed for each

Component is provided by Cocoon or built by you.

Page 16: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 16

Components (Matchers)• Suppose you wanted these URI patterns to

be handled by cocoon:– For example the wildcard patterns:– http://www.cdc.noaa.gov/cocoon/data/*.htmland– http://www.cdc.noaa.gov/cocoon/data/*.pdf

could result in two pipelines with two different outputs types.

Page 17: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 17

Components (Matchers)• Need a “bit of software” that looks at:

– http://www.cdc.noaa.gov/cocoon/data/data.sst.html– Matches the the URL www.cdc.noaa.gov/cocoon/data– And the extension “.html”– Extracts the wildcard part of the URL data.sst– Starts the pipeline to produce HTML output from the

data.sst.xml file (the wildcard plus the .xml extension).

Page 18: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 18

The WildCard Matcher• We’re in luck!• A Matcher Component already exists in

Cocoon to do what we want.• To use a Component we must declare it in

the sitemap.xmap file that controls our Cocoon installation.

Page 19: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 19

Declare the WildCard MatcherIn sitemap.xmap configuration file:<map:matchers default=“wildcard”> <map:matcher

name=“wildcard”src=

"org.apache.cocoon.matchingWildcardURIMatcher"/>…</map:matchers>

Page 20: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 20

Use the Matcher on a URI• We’ve declared the Matcher Component• Use the Matcher component in our pipeline

to grab the * part of the pattern and use it to specify the source XML file that will be send through the pipeline.

Page 21: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 21

Use the Matcher in a Pipeline• This pipeline uses the default Matcher,

which is the WildCard Matcher we declared in the previous slide

<map:match pattern=“data/*.html"> <map:generate src=" data/{1}.xml"/>

Page 22: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 22

Now What?• We have successfully declared and used a

Matcher to decide which pipeline we will use to process the first of our two examples URIs.

• Now we need to declare and use a Generator, which is always the first step of the pipeline.

Page 23: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 23

Components (Generators)• Declare a generator in sitemap.xmap:<map:generators default=“file”> <map:generator name=“file” src=

“org.apache.cocoon.generationFileGenerator”/>…</map:generators>

Page 24: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 24

Use the Generator in a Pipeline• The File Generator was declared as the default.• Its only job is to read the a file from the file system.<map:pipelines>

<map:pipeline><match pattern=“data/*.html”>

<map:generate src=“data/{1}.xml”/>

Page 25: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 25

Review: Matcher and Generator• Components (Matchers)• Need a “bit of software” that looks at:

– http://www.cdc.noaa.gov/cocoon/data/data.sst.html– Matches the the URL www.cdc.noaa.gov/cocoon/data– And the extension “.html”– Extracts the wildcard part of the URL data.sst– Starts the pipeline to produce HTML output from the

data.sst.xml file (the wildcard plus the .xml extension).

Page 26: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 26

Review: Pipeline Components• Conditional use of pipeline via the Matcher• One Generator (FileGenerator)• Zero or more Transforms (?)• Ends with a Serializer (?)

Page 27: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 27

Components (Transforms)• Declare a Transform:

<map:transformers default="xslt"><map:transformer name="xslt“ src="org. apache.cocoon.transformation.TraxTransformer">

<use-request-parameters>false

</use-request-parameters><use-browser-capabilities-db>

false</use-browser-capabilities-db>

</map:transformer>

Page 28: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 28

• Different from previous declarations we’ve seen.

• This declaration includes two additional configuration parameters.

The XSLT Transformer

<use-request-parameters>

<use-browser-capabilities-db>

Page 29: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 29

Add the Transformer to Pipeline<map:match pattern="*.html">

<map:generate src=" {1}.xml"/> <map:transform

src=“datastyle/HTMLstyle.xsl"/>

Page 30: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 30

The Stylesheet written in XSLT:<HTML> <HEAD> <TITLE><xsl:value-of select="/page/title"/></TITLE> </HEAD> <BODY>…<xsl:template match="/page/abstract"> <h2>Abstract:</h2> <xsl:apply-templates select="para"/></xsl:template>

Page 31: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 31

Components (Serializers)• The last step of each Pipeline is a Serializer• It consumes XML (in the form of SAX

events) and generates a character stream for a client (Web browser, Acrobat Reader, etc.).

Page 32: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 32

Declare the SerializerIn sitemap.xmap:<map:serializers default="html">

<map:serializer mime-type="text/html" name="html"

src=“...HTMLSerializer"> <buffer-size>1024</buffer-size> </map:serializer>

Page 33: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 33

The Completed Pipeline

<map:match pattern=“data/*.html"> <map:generate src=“data/{1}.xml"/> <map:transform

src=“datastyle/HTMLstyle.xsl"/><map:serialize/>

</map:match>

Page 34: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 34

Pipeline to make PDF output

<map:match pattern=“data/*.pdf"> <map:generate src=“data/{1}.xml"/> <map:transformsrc="stylesheets/FOstyle.xsl"/><map:serialize type="fo2pdf"/>

</map:match>

Page 35: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 35

Page 36: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 36

http://www.cdc.noaa.gov/cocoon/data/data.sst.html

Page 37: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 37

http://www.cdc.noaa.gov/cocoon/data/data.sst.pdf

Page 38: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 38

The Dreaded Demo• Demo Data Set Descriptions at CDC.

Page 39: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 39

Cocoon is all this and more!• Action Components to do complex

initialization (e.g. get database connection pool) during pipeline setup.

• Resource Components are internal reusable pipeline fragments.

• XSP and Logic Sheets offer capabilities similar to JSP with further separation of the logic.

Page 40: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 40

Resources• www.apache.org• Inside XSLT by Steven Holzner (New

Riders)• Java and XSLT by Eric M. Burke (O’Reilly)

Page 41: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 41

Reality Check!• We have not (yet) put this system in

production.• Still designing the XML representation.• Still learning about using Cocoon with a

relational database.• Considering using XSP pages.

Page 42: Cocoon An XML Web Publishing Framework From the Apache Project Roland Schweitzer

6 August 2002 OAR Web Shop 42

Conclusions• Cocoon offers the potential to use and reuse

one bit of XML content for many purposes.• Most operations for Web hosting the XML

content are built-in to Cocoon.• Unlimited customization by writing your

own Components.• Content is easily maintained and separated

from presentation.