practical use of xml - cerncern.ch/computing.seminars/2004/1201/slides.pdfbusiness section practical...
TRANSCRIPT
CERN – European Organization for Nuclear ResearchIT Department – e–Business Section
Practical Use ofPractical Use of XMLXML
Rostislav TitovIT-AIS-EB (e-Business) Section
CERN – Geneva, Switzerland
CERNe–Business
XMLXML
eXtensible Markup LanguageeXtensible Markup Language
l SGML (ISO standard, 1986)Mainly for technical documentation
l XML (W3C recommendation, 1998)Simplification and enhancement of SGML, wide area of use
CERNe–Business
<book lang=“Hungarian”><chapter>
<section> </section><section> </section>
</chapter><chapter>
<section> </section><section> </section>
</chapter> </book>
IntroductionTextMarkup
More document markupReserved attributesProcessing instructions
Why Markup?Why Markup?
Markup allows to add information about data
structure
Markup allows to add information about data
structure????????
?????????????
?????????????? ?????? ? ????????????????????????? ????????? ????????? ?? ?????????
CERNe–Business
<?xml version="1.0" encoding="UTF-8"?><presentation>
<author><firstname>Rostislav</firstname><lastname>Titov</lastname>
</author><chapter number="1" title="What is XML">
XML (Extensible Markup Language) is …</chapter><conclusion/>
</presentation>
XMLXML: Rules: Rules
l Headerl One root elementl Tag hierarchyl Attributes
Some rules
l Element names are case-sensitive
l Every opening tag should have a closing tag
l Tags cannot intersect (<a><b></a></b>)
l Attributes values – in quotes or apostrophes
l Text elementsl Empty elements
CERNe–Business
XMLXML: Data Transfer: Data Transfer
l Platform and language independent
l Easy to write, easy to process
l Understandable for humans and computers
l Open standard
– Many libraries exist
– Lots of literature available
– Specialized XML-editors
l Possibility to check the document structure
CERNe–Business
XMLXML: Data Transfer (2): Data Transfer (2)
ExternalProgram
EDH
XML
l Automatic form generation from external programs
l XML as data transfer format
l Schema checkup as a warranty of data consistency
Example: EDH Transport Request
CERNe–Business
Web ServicesWeb Services
Webservice
WSDLWSDL
XML
SOAPSOAP
XML
l Data transfer between programs on Internetl Open Standardl Platform and language independent (Java, .Net, …)
WSDL – Web Service Definition Language
SOAP – Simple Object Access Protocol
CERNe–Business
XMLXML: Data Storage: Data Storage
l Data structure is kept together with the data
l Object “addendum” to relational RDBMS
l Structure checkup
l Supported by many modern RDBMS
– Microsoft SQL Server 2005, Oracle 9i +,
– XML Data Type
– XML indexes
– XML Queries (XQuery etc.)
– Data output in XML format
CERNe–Business
XMLXML: Data Storage (2): Data Storage (2)
Example: EDH Search System
Our solution:
l All documents are stored in XML
l Context-specific XML search (Oracle InterMedia)
Example: «Find documents created by Slava»:
Select DOC_ID from DOC_XML where Contains(XML, “Slava within creator”) > 0;
Problem: Effective search using arbitrarynumber of criteria is problematic
CERNe–Business
XMLXML: Data Transformations: Data Transformations
l XML can be transformed into HTML, text, PDF, ...
– No need for special program solutions
– Commercial visual editors exist
– Platform independent
CERNe–Business
XMLXML--based Standardsbased Standards
l Possibility to formally define the structure
l Platform and language independent
l Understandable for humans and computers
l Possibility to use XML technologies (XSLT transformations, XQuery queries)…
– WSDL (Web Services Definition Language)
– SOAP (Simple Object Access Protocol)
– XHTML (HTML that complies to XML rules)
– SVG (Scalable Vector Graphics)
– ebXML (XML for e-Business)
– …
CERNe–Business
Formal Structure DefinitionFormal Structure Definition
l There are ways to define XML structure formally
• DTD (Document Type Definition)
• XML Schema
Obsolete!Not for new
development
Obsolete!Not for new
development
CERNe–Business
XML SchemaXML Schema: : PossibilitiesPossibilities
l Check element presence and their order
l Sequences and choices
l Number of repetitions for elements and groups
l Attributes and their presence
l Type of elements and attributes
l Restrictions for elements and attributes
l Default values
l Unique constraints
l ...
CERNe–Business
XMLXML--schemaschema: : when it is neededwhen it is needed??
l Formal structure definition for future reference
l Programmers may rely on data consistence
l Authors may check XML validness in advance
CERNe–Business
XMLXML--schemaschema: : when NOT neededwhen NOT needed??
lWhen we know in advance that XML is valid
lWhen we do not care about document validness
lWhen maximum processing speed is required
l Small “throw away” projects
CERNe–Business
XPathXPath: : XML NavigationXML Navigation
l Access to XML elements
l Result of an XPATH-expression can be:
C:\presentation\author\firstname /presentation/author/firstname
l XML Nodel Node Setl Boolean
l Stringl Numberl Empty Set
CERNe–Business
XXPathPath: Examples: Examples
l Find the DG’s name
/cern/dg/person/text()
l Find all departments
/cern/department/@name
l Find all people
//person
l Find the name of DH of IT
/cern/department[@name=“IT”]/dh/person/text()
l Find how many groups has a department where R. Martens workscount(//gl/person[starts-with(., 'R. Martens')]/../../../group)
<cern><dg><person>R. Aymar</person></dg><department name=“PH”>
<dh><person>W-D. Schlatter</person></dh></department><department name=“IT”>
<dh><person>W. von Rueden</person></dh><group name=“IT-AIS”>
<gl><person>R. Martens</person></gl></group><group name=“IT-CO”>
<gl><person>D. Myers</person></gl></group><group name=“IT-IS”>
<gl><person>A. Pace</person></gl></group>
</department></cern>
CERNe–Business
XPathXPath: Examples: Examples ((88))
Example: Event Handling System
Check eventsagainst XPath
XMLXML
XML
Events Subscriptions
XPath XPath
Handling System
Notifications
«I want to see all documents for more than 600 CHF»
/ document [amount > 600]
CERNe–Business
XPathXPath: Program Use: Program Use
Element root = xml.getDocumentElement();
Node child;
for (child = root.getFirstChild(); child != null; child = child.getNextSibling())
if (child.getNodeName().equals("report") && ( (Element)child ).getAttribute("name").equals("Slava"))
break;
for (child = ((Element)child).getFirstChild(); child != null; child = child.getNextSibling())
{
if (child.getNodeName().equals("title") )
{
for (Node child2 = child.getFirstChild(); child2 != null; child2 = child2.getNextSibling())
if ( child2 instanceof Text )
System.out.println(( (Text)child2 ).getData().trim());
}
}
System.out.println(((XMLDocument)xml).selectSingleNode("/config/report[@name='Slava']/title/text()").getNodeValue());
XPath
DOM Model
CERNe–Business
XQuerXQuery y ––XML XML Query LanguageQuery Language
l XQuery is SQL for XML
– Database independent
– Easy to use
l Supported by popular RDBMS(Microsoft SQL Server 2005, Oracle 9i and10g)
l Based on XPath, supports document sets
CERNe–Business
XSLT: XML TransformationsXSLT: XML Transformations
l Transforms XML to HTML, text or other XMLl XSLT 1.0 (Current), XSLT 2.0 (Draft)l XSLT is a “Human Interface” to XMLl Supported by Web Browsers
XSLT
CERNe–Business
XSLT: Simplified StructureXSLT: Simplified Structure
xsl:stylesheet
xsl:template
xsl:template
xsl:value-of
xsl:value-of
xsl:apply-templates
<html><body>
… </body>
<html>
l XSLT is an XML filel Active usage of XPath expressions
…
…
…
Apply a templateto the given element
Evaluate XPath and print value
Apply templatesto other elements
CERNe–Business
XSLT: PossibilitiesXSLT: Possibilities
l Conditions (<xsl:if>)l Loops (<xsl:for-each>)l Variables (<xsl:variable>)l Sorting (<xsl:sort>)l Numbering [1., 1.1., 1.1.?, 2.,] (<xsl:number>)l Number formatting (format-number())l Multiple step processing (mode)l String manipulations (via XPath)
XSLT 2.0 (Draft)l XPath 2.0l Custom functionsl Regular expressionsl Date and time formattingl Groupings
CERNe–Business
XSLT: ExampleXSLT: Example<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="presentation"><html>
<body bgcolor="#FFCCFF"><h1><font color="darkblue"><xsl:value-of select="title"/></font></h1><h4><font color="green"><i>Author: <xsl:value-of select="author"/></i></font></h4><b>Table of Contents</b><br/><br/><xsl:apply-templates select="chapter" mode="contents"/><br/><br/><xsl:apply-templates select="chapter" mode="normal"/>
</body></html>
</xsl:template>
<xsl:template match="chapter" mode="normal"><b>Chapter <xsl:value-of select="@number"/>. <xsl:value-of select="@title"/></b><br/><br/><i><xsl:value-of select="text()"/></i><br/><br/>
</xsl:template>
<xsl:template match="chapter" mode="contents"><xsl:value-of select="@number"/>. <xsl:value-of select="@title"/><br/>
</xsl:template></xsl:stylesheet>
CERNe–Business
XSLTXSLT:: Web Web “Skins”“Skins”
<aissearchscreen><head><title>Person Search</title></head><body>
<input type="hidden" name="isAdvanced" value="false"/><input show="always" type="text" label="Keyword" value="titov"/><input type="checkbox" label="Fuzzy search" value="No"/><result>
<header><tablecell>Full Name</tablecell>…
</header><row>
<tablecell>Maksym TITOV</tablecell><tablecell>71169</tablecell><tablecell>40-3-C08</tablecell>…
</row><row>
<tablecell>Oleg TITOV</tablecell><tablecell>EXT</tablecell>…
</row>…<rowcount>4</rowcount>
</result></body>
</aissearchscreen>
CERNe–Business
XSLTXSLT:: Web Web “Skins” “Skins” -- 22
XSLT
CERNe–Business
XSLTXSLT:: User InterfacesUser Interfaces
CERN Stores Catalog
l Data loaded through XML
l Data stored in XML
l XSLT for data output
l 150000 items
l +10000 users
l ~15-20K XML for each page
l Custom formatting (through XSLT redefinition)
CERNe–Business
XSLT: XML to TextXSLT: XML to Text
Example:l Automatic code generation
<document><input type=“person” name=“A”/><input type=“number” name=“B”/>…
</document>InterfaceInterface
XML-description
Program
Business LogicBusiness Logic
SQLSQL
...
Did you know…that 1 EDH document is:
l At least 20 source files (code, HTML templates, resources, SQL, …)
l About 250K of source code
CERNe–Business
XSLT: XML to XMLXSLT: XML to XML
l Generate XML from another XML source
l “Configuration files update”
l XSL:FO
CERNe–Business
XSLXSL--FO: Formatting ObjectsFO: Formatting Objects
l FO: XML-description of document layout
l XSL-FO: XSLT transformation of XML document to FO document
l FO Processor: program that converts the FO definition into a printable format (PDF, PS, ...)
<?xml version="1.0"?><presentation>
<title>XXX
</title></presentation>
<?xml version="1.0"?><presentation>
<title>XXX
</title></presentation>
<fo:root><fo:page-sequence>
<fo:flow>...
</fo:flow></fo:page-sequence></fo:root>
<fo:root><fo:page-sequence>
<fo:flow>...
</fo:flow></fo:page-sequence></fo:root>
XMLDocument
FODocument
PDFDocument
XSL:FOTransformation
FOProcessor
CERNe–Business
XSLXSL--FO: Formatting ObjectsFO: Formatting Objects
l Fontsl Paginationl Headers and footersl Page numberingl Odd/even page distinctionl Margins and intervalsl Keep paragraphs togetherl Hangout linesl Tablesl Graphicsl …
FO has all capabilities of moderntext editors:
FO Processor:Apache FOP
CERNe–Business
XSLXSL--FO: ExampleFO: Example
XMLXML
e-MAPS
XSLT
Web Interface
Printable Version
XSL:FO
FOP Processor
l No extra code requiredl RTF to XSL:FO converters are goodl Can be written by a studentl Output format independent
CERNe–Business
XMLXML EditorsEditors
l Specially designed for XML editing
l XML well-formedness and validity check
l DTD and Schema visual editing
l XML generation accordingly to DTD/Schema
l Creation and debugging of XSLT and XSL:FO
l Visual XSLT editing
Example: Altova XML Spy (www.xmlspy.com)
- Available from NICE
- License can be obtained from the SDT service
XMLSpy 2005
CERNe–Business
XMLXML: Program Handling: Program Handling
l DOM (Document Object Model)
– Tree building
l SAX
– Event handling– startElement()– endElement()
Java, C++:
– Apache Xalan
– Oracle XML Parser
...
PERL, .Net:
– Built-in support
SAX - much faster, DOM – more versatile
SAX - much faster, DOM – more versatile
CERNe–Business
New TechnologiesNew Technologies
l InfoPath 2003– Corporate system for electronic form
handling– XML-based– Business rules defined by XML schema– Data validation using XML schemas
l Adobe Intellegent Document Platform– Similar ideas
CERNe–Business
ConclusionConclusion
«XML is one of the biggest inventions in IT area in the last few years. There is a lot of XML applications around the world today, and this amount will grow every year»
«XML is one of the biggest inventions in IT area in the last few years. There is a lot of XML applications around the world today, and this amount will grow every year»
W3C Consortium Web Site:http://www.w3c.org
Questions:[email protected]