cocoon an xml web publishing framework from the apache project roland schweitzer
DESCRIPTION
36 August 2002OAR Web Shop Cocoon An XML-based WWW publishing framework implemented as a Java Servlet. –Web site content stored in XML files (or RDBMS, LDAP Server or other source) is transformed (mostly via XSLT) into new XML files (to exclude certain info for example) and then serialized into human usable output (like an HTML or PDF file).TRANSCRIPT
Cocoon
An XML Web Publishing Framework From the Apache
Project
Roland Schweitzer
6 August 2002 OAR Web Shop 2
Today’s Topics:• Definitions• Motivation• Required Tools (Java, Apache Tomcat and
Cocoon)• Basic Cocoon Operation
– Matchers, Generators, Transforms and Serializers. Oh My!
– sitemap.xml glues it all together.
6 August 2002 OAR Web Shop 3
Cocoon• An XML-based WWW publishing
framework implemented as a Java Servlet.– Web site content stored in XML files (or
RDBMS, LDAP Server or other source) is transformed (mostly via XSLT) into new XML files (to exclude certain info for example) and then serialized into human usable output (like an HTML or PDF file).
6 August 2002 OAR Web Shop 4
Reusable Content
6 August 2002 OAR Web Shop 5
Motivation for using Cocoon• We distribute climate data• Users (including scientists) find data via
public search engines like google• Public search engines index HTML content• NOAA and other scientific organization use
special purpose search engines that use FGDC (or DIF derived from FGDC)
6 August 2002 OAR Web Shop 6
Motivation continued• These facts add up to maintaining separate
“documents” for each purpose• XML and Cocoon offers a (yet another
potential) way out of the morass of many special purpose document collections
6 August 2002 OAR Web Shop 7
Suppose info was stored as XML
<page><title>Reynolds Sea Surface Temperature </title><prefix>data.sst</prefix><abstract><para>The optimum interpolation (OI) SST analysis…<para></abstract><contact><name>CDC Data Management Personel</name><address1>325 Broadway</address1><phone>(303) 497-6244</phone>
<email>[email protected]</email></contact>
…</page>
6 August 2002 OAR Web Shop 8
The Power of XML Content• Can be parsed with standard XML tools
– Can be easily used for another purpose besides the Web
– Can be written with powerful XML GUI tools (e.g. XML spy)
– (Might be) easier to maintain
6 August 2002 OAR Web Shop 9
Reusable Content
6 August 2002 OAR Web Shop 10
Schematic of the Solution Using Cocoon
R DBMSwith netCD F m etadata
Local standard HTML output
for hum ans
Hum an readableFG DC (output as H TML) with
nice form atting anchors and links
FG DC O utput fordom ain specific search
engines
XML D ocum ent
Cocoon Some other process
6 August 2002 OAR Web Shop 11
Required Tools• On Solaris 7 and 8 I have used the binary
distributions of:– Java 1.4.0 (java.sun.com)– Tomcat 4.0.4 (www.apache.org)– Cocoon 2.0.3 (xml.apache.org)
• At this time, these are the latest releases.• Follow the installation instructions for each
package.
6 August 2002 OAR Web Shop 12
Basic Operation• Cocoon is based on pipelines:
A Bit of SoftwareXML File New XML File
A Bit of Software
New XML FileA Bit of Software Info to client
(e.g HTML to browser)
6 August 2002 OAR Web Shop 13
Basic Operation• Cocoon is based on pipelines. An XML document
is pushed through a pipeline consisting of one Generator (read a file, create a file from an LDAP server, etc.), zero or more Transforms (for example, to leave out sensitive information for external users) and ends with a Serializer that transforms the XML to binary or character data for consumption by the client (Web browser).
• The entire site could use only one pipeline.
6 August 2002 OAR Web Shop 14
Basic Operation• If you need more than one pipeline…• Matchers (wildcard and regular expression)
and Selectors (Boolean expressions) can be used to control the pipeline used to process the XML content.
6 August 2002 OAR Web Shop 15
Components• Matchers, Generators, Transforms and
Serializers are all Cocoon Components.• Pipelines are build out of Components.• Components are declared and pipelines are
constructed in the sitemap.xmap file.• The “Bit of Software” needed for each
Component is provided by Cocoon or built by you.
6 August 2002 OAR Web Shop 16
Components (Matchers)• Suppose you wanted these URI patterns to
be handled by cocoon:– For example the wildcard patterns:– http://www.cdc.noaa.gov/cocoon/data/*.htmland– http://www.cdc.noaa.gov/cocoon/data/*.pdf
could result in two pipelines with two different outputs types.
6 August 2002 OAR Web Shop 17
Components (Matchers)• Need a “bit of software” that looks at:
– http://www.cdc.noaa.gov/cocoon/data/data.sst.html– Matches the the URL www.cdc.noaa.gov/cocoon/data– And the extension “.html”– Extracts the wildcard part of the URL data.sst– Starts the pipeline to produce HTML output from the
data.sst.xml file (the wildcard plus the .xml extension).
6 August 2002 OAR Web Shop 18
The WildCard Matcher• We’re in luck!• A Matcher Component already exists in
Cocoon to do what we want.• To use a Component we must declare it in
the sitemap.xmap file that controls our Cocoon installation.
6 August 2002 OAR Web Shop 19
Declare the WildCard MatcherIn sitemap.xmap configuration file:<map:matchers default=“wildcard”> <map:matcher
name=“wildcard”src=
"org.apache.cocoon.matchingWildcardURIMatcher"/>…</map:matchers>
6 August 2002 OAR Web Shop 20
Use the Matcher on a URI• We’ve declared the Matcher Component• Use the Matcher component in our pipeline
to grab the * part of the pattern and use it to specify the source XML file that will be send through the pipeline.
6 August 2002 OAR Web Shop 21
Use the Matcher in a Pipeline• This pipeline uses the default Matcher,
which is the WildCard Matcher we declared in the previous slide
<map:match pattern=“data/*.html"> <map:generate src=" data/{1}.xml"/>
6 August 2002 OAR Web Shop 22
Now What?• We have successfully declared and used a
Matcher to decide which pipeline we will use to process the first of our two examples URIs.
• Now we need to declare and use a Generator, which is always the first step of the pipeline.
6 August 2002 OAR Web Shop 23
Components (Generators)• Declare a generator in sitemap.xmap:<map:generators default=“file”> <map:generator name=“file” src=
“org.apache.cocoon.generationFileGenerator”/>…</map:generators>
6 August 2002 OAR Web Shop 24
Use the Generator in a Pipeline• The File Generator was declared as the default.• Its only job is to read the a file from the file system.<map:pipelines>
<map:pipeline><match pattern=“data/*.html”>
<map:generate src=“data/{1}.xml”/>
…
6 August 2002 OAR Web Shop 25
Review: Matcher and Generator• Components (Matchers)• Need a “bit of software” that looks at:
– http://www.cdc.noaa.gov/cocoon/data/data.sst.html– Matches the the URL www.cdc.noaa.gov/cocoon/data– And the extension “.html”– Extracts the wildcard part of the URL data.sst– Starts the pipeline to produce HTML output from the
data.sst.xml file (the wildcard plus the .xml extension).
6 August 2002 OAR Web Shop 26
Review: Pipeline Components• Conditional use of pipeline via the Matcher• One Generator (FileGenerator)• Zero or more Transforms (?)• Ends with a Serializer (?)
6 August 2002 OAR Web Shop 27
Components (Transforms)• Declare a Transform:
<map:transformers default="xslt"><map:transformer name="xslt“ src="org. apache.cocoon.transformation.TraxTransformer">
<use-request-parameters>false
</use-request-parameters><use-browser-capabilities-db>
false</use-browser-capabilities-db>
</map:transformer>
6 August 2002 OAR Web Shop 28
• Different from previous declarations we’ve seen.
• This declaration includes two additional configuration parameters.
The XSLT Transformer
<use-request-parameters>
<use-browser-capabilities-db>
6 August 2002 OAR Web Shop 29
Add the Transformer to Pipeline<map:match pattern="*.html">
<map:generate src=" {1}.xml"/> <map:transform
src=“datastyle/HTMLstyle.xsl"/>
6 August 2002 OAR Web Shop 30
The Stylesheet written in XSLT:<HTML> <HEAD> <TITLE><xsl:value-of select="/page/title"/></TITLE> </HEAD> <BODY>…<xsl:template match="/page/abstract"> <h2>Abstract:</h2> <xsl:apply-templates select="para"/></xsl:template>
6 August 2002 OAR Web Shop 31
Components (Serializers)• The last step of each Pipeline is a Serializer• It consumes XML (in the form of SAX
events) and generates a character stream for a client (Web browser, Acrobat Reader, etc.).
6 August 2002 OAR Web Shop 32
Declare the SerializerIn sitemap.xmap:<map:serializers default="html">
<map:serializer mime-type="text/html" name="html"
src=“...HTMLSerializer"> <buffer-size>1024</buffer-size> </map:serializer>
6 August 2002 OAR Web Shop 33
The Completed Pipeline
<map:match pattern=“data/*.html"> <map:generate src=“data/{1}.xml"/> <map:transform
src=“datastyle/HTMLstyle.xsl"/><map:serialize/>
</map:match>
6 August 2002 OAR Web Shop 34
Pipeline to make PDF output
<map:match pattern=“data/*.pdf"> <map:generate src=“data/{1}.xml"/> <map:transformsrc="stylesheets/FOstyle.xsl"/><map:serialize type="fo2pdf"/>
</map:match>
6 August 2002 OAR Web Shop 35
6 August 2002 OAR Web Shop 36
http://www.cdc.noaa.gov/cocoon/data/data.sst.html
6 August 2002 OAR Web Shop 37
http://www.cdc.noaa.gov/cocoon/data/data.sst.pdf
6 August 2002 OAR Web Shop 38
The Dreaded Demo• Demo Data Set Descriptions at CDC.
6 August 2002 OAR Web Shop 39
Cocoon is all this and more!• Action Components to do complex
initialization (e.g. get database connection pool) during pipeline setup.
• Resource Components are internal reusable pipeline fragments.
• XSP and Logic Sheets offer capabilities similar to JSP with further separation of the logic.
6 August 2002 OAR Web Shop 40
Resources• www.apache.org• Inside XSLT by Steven Holzner (New
Riders)• Java and XSLT by Eric M. Burke (O’Reilly)
6 August 2002 OAR Web Shop 41
Reality Check!• We have not (yet) put this system in
production.• Still designing the XML representation.• Still learning about using Cocoon with a
relational database.• Considering using XSP pages.
6 August 2002 OAR Web Shop 42
Conclusions• Cocoon offers the potential to use and reuse
one bit of XML content for many purposes.• Most operations for Web hosting the XML
content are built-in to Cocoon.• Unlimited customization by writing your
own Components.• Content is easily maintained and separated
from presentation.