cern – european organization for nuclear research it department – e – business section...

36
CERN – European Organization for Nuclear Research IT Department – e Business Section Practical Use of Practical Use of XML XML Rostislav Titov IT-AIS-EB (e-Business) Section CERN – Geneva, Switzerland

Post on 19-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN – European Organization for Nuclear Research

IT Department – e–Business Section

Practical Use ofPractical Use of XMLXML

Rostislav Titov

IT-AIS-EB (e-Business) SectionCERN – Geneva, Switzerland

Page 2: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XMLXML

eXtensible Markup LanguageeXtensible Markup Language

SGML (ISO standard, 1986)Mainly for technical documentation

XML (W3C recommendation, 1998)Simplification and enhancement of SGML, wide area of use

Page 3: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

<book lang=“Hungarian”> <chapter>

<section> </section> <section> </section> </chapter> <chapter>

<section> </section> <section> </section> </chapter> </book>

Introduction Text Markup

More document markup Reserved attributes Processing instructions

Why Markup?Why Markup?

Markup allows to add information about data

structure

Markup allows to add information about data

structureВведение Текст Разметка

Дополнительные данные о разметке Зарезервированные атрибуты Инструкции по обработке

Page 4: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

<?xml version="1.0" encoding="UTF-8"?><presentation>

<author><firstname>Rostislav</firstname><lastname>Titov</lastname>

</author><chapter number="1" title="What is XML">

XML (Extensible Markup Language) is …</chapter><conclusion/>

</presentation>

XMLXML: Rules: Rules

Header One root element Tag hierarchy Attributes

Some rules Element names are case-sensitive Every opening tag should have a closing tag Tags cannot intersect (<a><b></a></b>) Attributes values – in quotes or apostrophes

Text elements Empty elements

Page 5: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XMLXML: Data Transfer: Data Transfer

Platform and language independent

Easy to write, easy to process

Understandable for humans and computers

Open standard

– Many libraries exist

– Lots of literature available

– Specialized XML-editors

Possibility to check the document structure

Page 6: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XMLXML: Data Transfer (2): Data Transfer (2)

ExternalProgram

EDH

XML

Automatic form generation from external programs

XML as data transfer format

Schema checkup as a warranty of data consistency

Example: EDH Transport Request

Page 7: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

Web ServicesWeb Services

Webservice

WSDLWSDL

XML

SOAPSOAP

XML

Data transfer between programs on Internet Open Standard Platform and language independent (Java, .Net, …)

WSDL – Web Service Definition Language

SOAP – Simple Object Access Protocol

Page 8: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XMLXML: Data Storage: Data Storage

Data structure is kept together with the data

Object “addendum” to relational RDBMS

Structure checkup

Supported by many modern RDBMS

– Microsoft SQL Server 2005, Oracle 9i +,

– XML Data Type

– XML indexes

– XML Queries (XQuery etc.)

– Data output in XML format

Page 9: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XMLXML: Data Storage (2): Data Storage (2)

Example: EDH Search System

Our solution:

All documents are stored in XML

Context-specific XML search (Oracle InterMedia)

Example: «Find documents created by Slava»:

Select DOC_ID from DOC_XML where Contains(XML, “Slava within creator”) > 0;

Problem: Effective search using arbitrarynumber of criteria is problematic

Page 10: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XMLXML: Data Transformations: Data Transformations

XML can be transformed into HTML, text, PDF, ...

– No need for special program solutions

– Commercial visual editors exist

– Platform independent

Page 11: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XMLXML-based Standards-based Standards

Possibility to formally define the structure

Platform and language independent

Understandable for humans and computers

Possibility to use XML technologies (XSLT transformations, XQuery queries)…

– WSDL (Web Services Definition Language)

– SOAP (Simple Object Access Protocol)

– XHTML (HTML that complies to XML rules)

– SVG (Scalable Vector Graphics)

– ebXML (XML for e-Business)

– …

Page 12: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

Formal Structure DefinitionFormal Structure Definition

There are ways to define XML structure formally

• DTD (Document Type Definition)

• XML Schema

Obsolete!Not for new development

Obsolete!Not for new development

Page 13: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XML SchemaXML Schema: : PossibilitiesPossibilities

Check element presence and their order Sequences and choices Number of repetitions for elements and groups Attributes and their presence Type of elements and attributes Restrictions for elements and attributes Default values Unique constraints ...

Page 14: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XMLXML-schema-schema: : when it is neededwhen it is needed??

Formal structure definition for future reference

Programmers may rely on data consistence

Authors may check XML validness in advance

Page 15: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XMLXML-schema-schema: : when NOT neededwhen NOT needed??

When we know in advance that XML is valid

When we do not care about document validness

When maximum processing speed is required

Small “throw away” projects

Page 16: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XPathXPath: : XML NavigationXML Navigation

Access to XML elements

Result of an XPATH-expression can be:

C:\presentation\author\firstname /presentation/author/firstname

XML Node Node Set Boolean

String Number Empty Set

Page 17: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XXPathPath: Examples: Examples

Find the DG’s name

/cern/dg/person/text()

Find all departments

/cern/department/@name

Find all people

//person

Find the name of DH of IT

/cern/department[@name=“IT”]/dh/person/text()

Find how many groups has a department where R. Martens works

count(//gl/person[starts-with(., 'R. Martens')]/../../../group)

<cern> <dg><person>R. Aymar</person></dg><department name=“PH”>

<dh><person>W-D. Schlatter</person></dh></department><department name=“IT”>

<dh><person>W. von Rueden</person></dh> <group name=“IT-AIS”>

<gl><person>R. Martens</person></gl> </group>

<group name=“IT-CO”> <gl><person>D. Myers</person></gl> </group>

<group name=“IT-IS”> <gl><person>A. Pace</person></gl> </group>

</department></cern>

Page 18: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XPathXPath: Examples: Examples ( (88))

Example: Event Handling System

Check eventsagainst XPath

XMLXML

XML

Events Subscriptions

XPath XPath

Handling System

Notifications

«I want to see all documents for more than 600 CHF»

/ document [amount > 600]

Page 19: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XPathXPath: Program Use: Program Use

Element root = xml.getDocumentElement();

Node child;

for (child = root.getFirstChild(); child != null; child = child.getNextSibling())

if (child.getNodeName().equals("report") && ( (Element)child ).getAttribute("name").equals("Slava"))

break;

for (child = ((Element)child).getFirstChild(); child != null; child = child.getNextSibling())

{

if (child.getNodeName().equals("title") )

{

for (Node child2 = child.getFirstChild(); child2 != null; child2 = child2.getNextSibling())

if ( child2 instanceof Text )

System.out.println(( (Text)child2 ).getData().trim());

}

}

System.out.println(((XMLDocument)xml).selectSingleNode( "/config/report[@name='Slava']/title/text()").getNodeValue());

XPath

DOM Model

Page 20: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XQuerXQuery –y –XML XML Query LanguageQuery Language

XQuery is SQL for XML

– Database independent

– Easy to use

Supported by popular RDBMS (Microsoft SQL Server 2005, Oracle 9i and10g)

Based on XPath, supports document sets

Page 21: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XSLT: XML TransformationsXSLT: XML Transformations

Transforms XML to HTML, text or other XML XSLT 1.0 (Current), XSLT 2.0 (Draft) XSLT is a “Human Interface” to XML Supported by Web Browsers

XSLT

Page 22: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XSLT: Simplified StructureXSLT: Simplified Structure

xsl:stylesheet

xsl:template

xsl:template

xsl:value-of

xsl:value-of

xsl:apply-templates

<html> <body>

… </body><html>

XSLT is an XML file Active usage of XPath expressions

Apply a templateto the given element

Evaluate XPath and print value

Apply templatesto other elements

Page 23: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XSLT: PossibilitiesXSLT: Possibilities

Conditions (<xsl:if>) Loops (<xsl:for-each>) Variables (<xsl:variable>) Sorting (<xsl:sort>) Numbering [1., 1.1., 1.1.а, 2.,] (<xsl:number>) Number formatting (format-number()) Multiple step processing (mode) String manipulations (via XPath)

XSLT 2.0 (Draft) XPath 2.0 Custom functions Regular expressions Date and time formatting Groupings

Page 24: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XSLT: ExampleXSLT: Example<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="html" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="presentation"><html>

<body bgcolor="#FFCCFF"><h1><font color="darkblue"><xsl:value-of select="title"/></font></h1><h4><font color="green"><i>Author: <xsl:value-of select="author"/></i></font></h4><b>Table of Contents</b><br/><br/><xsl:apply-templates select="chapter" mode="contents"/><br/><br/><xsl:apply-templates select="chapter" mode="normal"/>

</body></html>

</xsl:template>

<xsl:template match="chapter" mode="normal"><b>Chapter <xsl:value-of select="@number"/>. <xsl:value-of select="@title"/></b><br/><br/><i><xsl:value-of select="text()"/></i><br/><br/>

</xsl:template>

<xsl:template match="chapter" mode="contents"><xsl:value-of select="@number"/>. <xsl:value-of select="@title"/><br/>

</xsl:template></xsl:stylesheet>

Page 25: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XSLTXSLT:: Web Web “Skins”“Skins”

<aissearchscreen><head><title>Person Search</title></head><body>

<input type="hidden" name="isAdvanced" value="false"/><input show="always" type="text" label="Keyword" value="titov"/><input type="checkbox" label="Fuzzy search" value="No"/><result>

<header><tablecell>Full Name</tablecell>…

</header><row>

<tablecell>Maksym TITOV</tablecell><tablecell>71169</tablecell><tablecell>40-3-C08</tablecell>…

</row><row>

<tablecell>Oleg TITOV</tablecell><tablecell>EXT</tablecell>…

</row>…<rowcount>4</rowcount>

</result></body>

</aissearchscreen>

Page 26: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XSLTXSLT:: Web Web “Skins” - 2“Skins” - 2

XSLT

Page 27: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XSLTXSLT:: User InterfacesUser Interfaces

CERN Stores Catalog

Data loaded through XML

Data stored in XML

XSLT for data output

150000 items

+10000 users

~15-20K XML for each page

Custom formatting (through XSLT redefinition)

Page 28: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XSLT: XML to TextXSLT: XML to Text

Example: Automatic code generation

<document> <input type=“person” name=“A”/> <input type=“number” name=“B”/> …</document>

InterfaceInterface

XML-description

Program

Business LogicBusiness Logic

SQLSQL

...

Did you know…that 1 EDH document is:

At least 20 source files (code, HTML templates, resources, SQL, …)

About 250K of source code

Page 29: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XSLT: XML to XMLXSLT: XML to XML

Generate XML from another XML source

“Configuration files update”

XSL:FO

Page 30: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XSL-FO: Formatting ObjectsXSL-FO: Formatting Objects

FO: XML-description of document layout

XSL-FO: XSLT transformation of XML document to FO document

FO Processor: program that converts the FO definition into a printable format (PDF, PS, ...)

<?xml version="1.0"?><presentation> <title> XXX </title></presentation>

<?xml version="1.0"?><presentation> <title> XXX </title></presentation>

<fo:root><fo:page-sequence> <fo:flow> ... </fo:flow></fo:page-sequence></fo:root>

<fo:root><fo:page-sequence> <fo:flow> ... </fo:flow></fo:page-sequence></fo:root>

XMLDocument

FODocument

PDFDocument

XSL:FOTransformation

FOProcessor

Page 31: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XSL-FO: Formatting ObjectsXSL-FO: Formatting Objects

Fonts Pagination Headers and footers Page numbering Odd/even page distinction Margins and intervals Keep paragraphs together Hangout lines Tables Graphics …

FO has all capabilities of moderntext editors:

FO Processor:Apache FOP

Page 32: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XSL-FO: ExampleXSL-FO: Example

XMLXML

e-MAPS

XSLT

Web Interface

Printable Version

XSL:FO

FOP Processor

No extra code required RTF to XSL:FO converters are good Can be written by a student Output format independent

Page 33: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XMLXML Editors Editors

Specially designed for XML editing

XML well-formedness and validity check

DTD and Schema visual editing XML generation accordingly to DTD/Schema Creation and debugging of XSLT and XSL:FO Visual XSLT editing

Example: Altova XML Spy (www.xmlspy.com)

- Available from NICE

- License can be obtained from the SDT service

XMLSpy 2005

Page 34: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

XMLXML: Program Handling: Program Handling

DOM (Document Object Model)

– Tree building

SAX

– Event handling– startElement()– endElement()

Java, C++:

– Apache Xalan

– Oracle XML Parser

...

PERL, .Net:

– Built-in support

SAX - much faster, DOM – more versatile

SAX - much faster, DOM – more versatile

Page 35: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

New TechnologiesNew Technologies

InfoPath 2003– Corporate system for electronic form

handling– XML-based– Business rules defined by XML schema– Data validation using XML schemas

Adobe Intellegent Document Platform– Similar ideas

Page 36: CERN – European Organization for Nuclear Research IT Department – e – Business Section Practical Use of XML Rostislav Titov IT-AIS-EB (e-Business) Section

CERN

e–Business

ConclusionConclusion

«XML is one of the biggest inventions in IT area in the last few years. There is a lot of XML applications around the world today, and this amount will grow every year»

«XML is one of the biggest inventions in IT area in the last few years. There is a lot of XML applications around the world today, and this amount will grow every year»

W3C Consortium Web Site:http://www.w3c.org

Questions:[email protected]