xml, dtd, xml schema, and xslt

Post on 13-Mar-2016

88 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

XML, DTD, XML Schema, and XSLT. Jianguo Lu University of Windsor. Where we are. XML DTD XML Schema XML Namespace XPath DOM Tree XSLT. Name Conflict. Apples Bananas . - PowerPoint PPT Presentation

TRANSCRIPT

1

XML, DTD, XML Schema, and XSLT

Jianguo LuUniversity of Windsor

2

Where we are

• XML • DTD• XML Schema• XML Namespace• XPath• DOM Tree• XSLT

3

Name Conflict

• Solution: add prefix to the tag names

<table><tr><td>Apples</td><td>Bananas</td></tr>

</table>

<table><name>African Coffee

Table</name><width>80</width><length>120</length>

</table>

<h:table><h:tr><h:td>Apples</

h:td><h:td>Bananas</

h:td></h:tr>

</h:table>

<f:table><f:name>African Coffee

Table </f:name>

<f:width>80</f:width>

<f:length>120</f:length></f:table>

4

Name spaces

table

tr

html

body th

td

table

price

name

length

width

HTML name space Furniture name space

height

5

XML namespace

• An XML document may use more than one schema;• Since each structuring document was developed

independently, name clashes may appear;• The solution is to use a different prefix for each schema

– prefix:name

<prod:product xmlns:prod=http://example.org/prod> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size></prod:product>

6

Namespace names

• Namespace names are URIs– Many namespace names are in the form of HTTP URI.

• The purpose of a name space is not to point to a location where a resource resides. – It is intended to provide a unique name that can be associated

with a particular organization. – The URI MAY point to a schema.

<prod:product xmlns:prod=http://example.org/prod> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size></prod:product>

7

Namespace declaration

• A namespace is declared using an attribute starts with “xmlns”.

• You can declare multiple namespaces in one instance.

<ord:order xmlns:ord=“http://example.org/ord” xmlns:prod=“http://example.org/prod” ><ord:number> 123ABC123</ord:number>

<prod:product> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product></ord:order>

8

Default namespace declaration

• Default namespace maps unprefixed element type name to a namespace.

<order xmlns=“http://example.org/ord” xmlns:prod=“http://example.org/prod” ><number> 123ABC123 </number><prod:product>

<prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product></order>

9

Scope of namespace declaration

• Namespace declaration can appear in any start tag.• The scope is in the element where it is declared.

<order xmlns=“http://example.org/ord”><number> 123ABC123 </number><prod:product xmlns:prod=“http://example.org/prod”>

<prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product></order>

10

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/></xsd:schema>

The elements anddatatypes thatare used to constructschemas - schema - element - complexType - sequence - stringcome from the http://…/XMLSchemanamespace

From Costello

Indicates that theelements definedby this schema - BookStore - Book - Title - Author - Date - ISBN - Publisherare to go in thehttp://www.books.orgnamespace

11

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/></xsd:schema>

This is referencing a Book element declaration.The Book in whatnamespace? Since thereis no namespace qualifierit is referencing the Bookelement in the defaultnamespace, which is thetargetNamespace! Thus,this is a reference to theBook element declarationin this schema.

The default namespace ishttp://www.books.orgwhich is the targetNamespace!

From Costello

12

Import in XML Schema• Now with the understanding of namespace, we can introduce

some more advanced features in XML Schema.• The import element allows you to access elements and types in

a different namespace.

<xsd:schema …> <xsd:import namespace="A" schemaLocation="A.xsd"/> <xsd:import namespace="B" schemaLocation="B.xsd"/> …</xsd:schema>

NamespaceA

A.xsd

NamespaceB

B.xsd

C.xsd

13

Example

Camera.xsd

Nikon.xsd Olympus.xsd Pentax.xsd

From Costello

14

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.nikon.com" xmlns="http://www.nikon.com" elementFormDefault="qualified"> <xsd:complexType name="body_type"> <xsd:sequence> <xsd:element name="description" type="xsd:string"/> </xsd:sequence> </xsd:complexType></xsd:schema>

Nikon.xsd

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.olympus.com" xmlns="http://www.olympus.com" elementFormDefault="qualified"> <xsd:complexType name="lens_type"> <xsd:sequence> <xsd:element name="zoom" type="xsd:string"/> <xsd:element name="f-stop" type="xsd:string"/> </xsd:sequence> </xsd:complexType></xsd:schema>

Olympus.xsd

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.pentax.com" xmlns="http://www.pentax.com" elementFormDefault="qualified"> <xsd:complexType name="manual_adapter_type"> <xsd:sequence> <xsd:element name="speed" type="xsd:string"/> </xsd:sequence> </xsd:complexType></xsd:schema>

Pentax.xsd

From Costello

15

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.camera.org" xmlns:nikon="http://www.nikon.com" xmlns:olympus="http://www.olympus.com" xmlns:pentax="http://www.pentax.com" elementFormDefault="qualified"> <xsd:import namespace="http://www.nikon.com" schemaLocation="Nikon.xsd"/> <xsd:import namespace="http://www.olympus.com" schemaLocation="Olympus.xsd"/> <xsd:import namespace="http://www.pentax.com" schemaLocation="Pentax.xsd"/> <xsd:element name="camera"> <xsd:complexType> <xsd:sequence> <xsd:element name="body" type="nikon:body_type"/> <xsd:element name="lens" type="olympus:lens_type"/> <xsd:element name="manual_adapter“ type="pentax:manual_adapter_type"/> </xsd:sequence> </xsd:complexType> </xsd:element><xsd:schema>

Camera.xsd

Here I amusing thebody_typethat isdefinedin the Nikonnamespace

From Costello

16

<?xml version="1.0"?><c:camera xmlns:c="http://www.camera.org" xmlns:nikon="http://www.nikon.com" xmlns:olympus="http://www.olympus.com" xmlns:pentax=http://www.pentax.com… … <c:body> <nikon:description>Ergonomically designed casing for easy handling </nikon:description> </c:body> <c:lens> <olympus:zoom>300mm</olympus:zoom> <olympus:f-stop>1.2</olympus:f-stop> </c:lens> <c:manual_adapter> <pentax:speed>1/10,000 sec to 100 sec</pentax:speed> </c:manual_adapter></c:camera>

Camera.xml

From Costello

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.nikon.com" xmlns="http://www.nikon.com" elementFormDefault="qualified"> <xsd:complexType name="body_type"> <xsd:sequence> <xsd:element name="description" type="xsd:string"/> </xsd:sequence> </xsd:complexType></xsd:schema>

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.olympus.com" xmlns="http://www.olympus.com" elementFormDefault="qualified"> <xsd:complexType name="lens_type"> <xsd:sequence> <xsd:element name="zoom" type="xsd:string"/> <xsd:element name="f-stop" type="xsd:string"/> </xsd:sequence> </xsd:complexType></xsd:schema>

17

Include• The include element allows you to access components in other schemas

– All the schemas you include must have the same namespace as your schema (i.e., the schema that is doing the include)

– The net effect of include is as though you had typed all the definitions directly into the containing schema

<xsd:schema …> <xsd:include schemaLocation="LibraryBook.xsd"/> <xsd:include schemaLocation="LibraryEmployee.xsd"/> …</xsd:schema>

LibraryBook.xsd LibraryEmployee.xsd

Library.xsdFrom Costello

18

<?xml version="1.0"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.library.org" xmlns="http://www.library.org" elementFormDefault="qualified"> <xsd:include schemaLocation="LibraryBook.xsd"/> <xsd:include schemaLocation="LibraryEmployee.xsd"/> <xsd:element name="Library"> <xsd:complexType> <xsd:sequence> <xsd:element name="Books"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Employees"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Employee" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element></xsd:schema>

Library.xsd

These arereferencingelementdeclarationsin otherschemas.

From Costello

19

XML Path

•XML •DTD•XML Schema•XML Namespace•XPath•DOM Tree•XSLT

20

XPath

• Language for addressing parts of an XML document. – It operates on the tree data model of XML

• XPath is a syntax for defining parts of an XML document • XPath uses paths to define XML elements

– It has a non-XML syntax

• XPath defines a library of standard functions – Such as arithmetic expressions.

• XPath is a major element in XSLT and XML query languages

• XPath is a W3C Standard

21

What is XPath

• Like traditional file paths• XPath uses path expressions to identify nodes in an XML

document. These path expressions look very much like the expressions you see when you work with a computer file system:– public_html/569/xml.ppt – Books/book/author/name/FirstName

• Absolute path– /library/author/book

• Relative path– author/book

22

XML path example<library location="Bremen">

<author name="Henry Wise"><book title="Artificial Intelligence"/><book title="Modern Web Services"/><book title="Theory of Computation"/></author><author name="William Smart"><book title="Artificial Intelligence"/></author><author name="Cynthia Singleton"><book title="The Semantic Web"/><book title="Browser Technology Revised"/></author>

</library>

/library

/library/author

//author

/library/@location

//book[@title=“Artificial Intelligence”]

23

XML Path Example

• Address all author elements– /library/author– Addresses all author elements that are children of the library

element node, which resides immediately below the root– /t1/.../tn, where each ti+1 is a child node of ti, is a path through

the tree representation

• Address all author elements– //author– Here // says that we should consider all elements in the

document and check whether they are of type author– This path expression addresses all author elements anywhere in

the document

24

XPath example

• Select the location attribute nodes within library element nodes – /library/@location– The symbol @ is used to denote attribute nodes

• Select all title attribute nodes within book elements anywhere in the document, which have the value “Artificial Intelligence” – //book/@title="Artificial Intelligence“

• Select all books with title “Artificial Intelligence”– /library/author/book[@title="Artificial Intelligence"] – Test within square brackets: a filter expression

• It restricts the set of addressed nodes.– Difference with previous query.

• This query addresses book elements, the title of which satisfies a certain condition.

• Previous query collects title attribute nodes of book elements

25

root

library

author author author author

name book book book name book name

title

Artificial Intelligence

title title

Artificial intelligence

title

Henry

26

XPath syntax• A path expression consists of a series of steps, separated by slashes • A step consists of

– An axis specifier, – A node test, and – An optional predicate

• An axis specifier determines the tree relationship between the nodes to be addressed and the context node

– E.g. parent, ancestor, child (the default), sibling, attribute node– // is such an axis specifier: descendant or self– child::book select all book elements that are children of current node

• A node test specifies which nodes to address – The most common node tests are element names

• /library/author– E.g., * addresses all element nodes

• /library/*– comment() selects all comment nodes

• /library/commnets()

27

XPath syntax

• Predicates (or filter expressions) are optional and are used to refine the set of addressed nodes– E.g., the expression [1] selects the first node– [position()=last()] selects the last node– [position() mod 2 =0] selects the even nodes

• XPath has a more complicated full syntax. – We have only presented the abbreviated syntax

28

More examples

• Address the first author element node in the XML document– //author[1]

• Address the last book element within the first author element node in the document– //author[1]/book[last()]

• Address all book element nodes without a title attribute– //book[not @title]

29

Where we are

• XML • DTD• XML Schema• XML Namespace• XPath• DOM Tree• XSLT

30

How to process XML

• XML does not DO anything• Process XML using general purpose languages

– Java, Perl, C++ …– DOM is the basis

• Process XML using special purpose languages– “translate the stock XML file to an HTML table.”

• Transform the XML: XSLT– “tell me the stocks that are higher that 100.”

• Query XML: XQuery

31

DOM (Document Object Model)

• What: DOM is application programming interface (API) for processing XML documents– http://www.w3c.org/DOM/

• Why: – unique interface. – Platform and language independence.

• How: It defines the logical structure of documents and the way to access and manipulate it– With the Document Object Model, one can

• Create an object tree• Navigate its structure• Access, add, modify, or delete elements etc

32

XML tree hierarchy

• XML can be described by a tree hierarchy

DocumentUnit

Sub-unit Document

Unit

Sub-unit

Parent

Child

Sibling

33

DOM tree model

• Generic tree model– Node

• Type, name, value• Attributes• Parent node• Previous, next sibling nodes• First, last child nodes

– Many other entities extends node

• Document• Element• Attribute• ... ...

Node

Parent

Prev. Sibling Next Sibling

First Child Last Child

34

DOM class hierarchyDocumentFragment

Document

CharacterDataText

Comment

CDATASection

Attr

Element

DocumentType

Notation

Entity

EntityReference

ProcessingInstruction

Node

NodeList

NamedNodeMap

DocumentType

35

JavaDoc of DOM API http://xml.apache.org/xerces-j/apiDocs/index.html

36

Remarks on javadoc• javadoc is a command included in JDK;• It is a useful tool generate HTML description for your programs, so

that you can use a browser to look at the description of the classes;• JavaDoc describes classes, their relationships, methods, attributes,

and comments.• When you write java programs, the JavaDoc is the first place that you

should look at:– For core java, there is JavaDoc to describe every class in the language; – To know how to use DOM, look at the javaDoc of org.w3c.dom package.

• If you are a serious java programmer: – you should have the core jdk javaDoc ready on your hard disk;– You should generate the javaDoc for other people to look at.

• To run javadoc, type D>javadoc *.javaThis is to generate JavaDoc for all the classes under current directory.

37

Methods in Node interface

• Three categories of methods– Node characteristics

• name, type, value– Contextual location and access to relatives

• parents, siblings, children, ancestors, descendants– Node modification

• Edit, delete, re-arrange child nodes

38

XML parser and DOM

• When you parse an XML document with a DOM parser, you get back a tree structure that contains all of the elements of your document;

• DOM also provides a variety of functions you can use to examine the contents and structure of the document.

DOM APIDOM

XML parser

DOM Tree

Your XML application

39

DOM tree and DOM classes<stocks>

<stock Exchange=“nyse” >

<name> <price>

IBM 105

<stock exchange=“nasdaq”>

<name> <symbol> <price>

Amazon inc amzn 15.45

ElementgetAttribute(String)getTagName()

Node

getFistChild()getParentNode()getNextSibling()

Document

getElementsByTagName()

Attr

getName()getValue()

Element

child

Node

TextNode

40

Use Java to process XML• Tasks:

– How to construct the DOM tree from an XML text file?– How to get the list of stock elements?– How to get the attribute value of the second stock element?

• Construct the Document object:– Need to use an XML parser (XML4J);– Remember to import the necessary packages;– The benefits of DOM: the following lines are the only

difference if you use another DOM XML parser.

41

Get the first stock element<?xml version="1.0" ?> <stocks> <stock exchange="nasdaq">  <name>amazon corp</name>   <symbol>amzn</symbol>   <price>16</price>   </stock> <stock exchange="nyse">  <name>IBM inc</name>   <price>102</price>   </stock>  </stocks>

42

Navigate to the next sibling of the first stock element

<?xml version="1.0" ?> <stocks> <stock exchange="nasdaq">  <name>amazon corp</name>   <symbol>amzn</symbol>   <price>16</price>   </stock> <stock exchange="nyse">  <name>IBM inc</name>   <price>102</price>   </stock>  </stocks>

43

Be aware the Text object in two elements

<stocks>

<stock Exchange=“nyse” >

<name> <price>

IBM inc 102

<stock exchange=“nasdaq”>

<name> <symbol> <price>

Amazon inc amzn 16

<?xml version="1.0" ?> <stocks> <stock exchange="nasdaq">  <name>amazon corp</name>   <symbol>amzn</symbol>   <price>16</price>   </stock> <stock exchange="nyse">  <name>IBM inc</name>   <price>102</price>   </stock>  </stocks>

text

texttext text text text text text

texttext

Question: How many children does the stocks node have?

44

Remarks on XML parsers

• There are several different ways to categorise parsers:– Validating versus non-validating parsers;

• It takes a significant amount of effort for an XML parser to process a DTD and make sure that every element in an XML document follows the rules of the DTD;

• If only want to find tags and extract information - use non-validating;• Validating or non-validating can be turned on or off in parsers.

– Parsers that support the Document Object Model (DOM); – Parsers that support the Simple API for XML (SAX) ;– Parsers written in a particular language (Java, C++, Perl, etc.).

45

Where we are

• XML • DTD• XML Schema• XML Namespace• XPath• DOM Tree• XSLT

46

History

XSL

XSLXSLT

XPath

XLink/XPointer

XQuery

XMLSchemas

(high-precision graphics, e.g., PDF)

(low-precision graphics, e.g.,HTML,text, XML)

47

XSLT(XML Stylesheet Language Transformation)• XSLT Version 1.0 is a W3C Recommendation,

1999

• XSLT is used to transform XML to other formats

XSLT 1

XSLT 2

XSLT 3

XML

TEXT

HTML

XML

48

XSLT basics• XSLT is an XML document itself • It is a tree transformation language

XML

XSLT

XSLT processor

• It is a rule-based declarative language– XSLT program consists of a sequence of rules. – It is a functional programming language.

49

XSLT Example: transform to another XML<?xml version="1.0" ?> <stocks> <stock exchange="nasdaq">  <name>amazon corp

</name>   <symbol>amzn</symbol>   <price>16</price>   </stock> <stock exchange="nyse">  <name>IBM inc</name>   <price>102</price>   </stock>  </stocks>

<?xml version="1.0“><companies> <company>

<value>24 CAD </value> <name>amazon corp</name> </company> <company>

<value>153 CAD </value> <name>IBM inc</name>

</company></companies>

?

stock.xml output

• Rename the element names• Remove the attribute and the symbol element• Change the order between name and price. • Change the US dollar to CAD.

50

A most simple XSLT

51

Template definition and call

52

If statement

53

XSLT rule: <xsl:template>

<xsl:template match="stock"><company><value> <xsl:value-of select="price*1.5"/> CAD</value><name> <xsl:value-of select="name"/></name><company></xsl:template>

<?xml version="1.0" ?> <stocks> <stock

exchange="nasdaq">  <name>amazon corp

</name>   <symbol>amzn</symbol>   <price>16</price>   </stock> <stock exchange="nyse">  <name>IBM inc</name>   <price>102</price>   </stock>  </stocks>

<company><value> get the value of <price>* 1.5, i.e. 24 CAD</value><name> get the value of <name>, i.e amazon</name></company>

xslt template for <stock>

stock.xml

Part of the output

54

apply template 1 to <stocks>

XSLT process model

<companies> apply template 2 to <stock> 1

apply template 2 to <stock> 2

</companies>

<company> value> get the value of <price>*1.5,i.e., 153 CAD </value><name> get the value of <name>, i.e., IBM</name></company>

<company> <value> get the value of <price>*1.5,i.e. 24 CAD </value><name> get the value of <name>, i.e amazon</name></company>

<xsl:template match="/"> <companies><xsl:apply-templates

select="stocks/stock”/></companies></xsl:template>

<xsl:template match="stock">

<company> <value> <xsl:value-of

select="price*1.5"/> CAD</value> <name> <xsl:value-of

select="name"/></name></company></xsl:template>

toXML.xsl

xslt output

55

Transforming XML to HTML

toHTML.xsl

56

Running XSLT from the client side • Browser gets the XML+XSLT, and interprets them inside the

browser.

• How to specify the XSL associated with the XML file?– <?xml-stylesheet type="text/xsl" href="stock.xsl"?>

• Advantages:– Easy to develop and deploy.

• Disadvantages: – Not every browser supports XML+XSL;– Browsers do not support all XSLT features;– Not secure: you only want to show part of the XML data;– Not efficient.

Web server

57

Run XSLT from the server side

• XSL processor transforms the XML and XSLT to HTML, and the web server send the HTML to the browser.

• Popular tool: xalan

java -classpath xalan/bin/xalan.jar org.apache.xalan.xslt.Process -in stock.xml -xsl stock.xsl -out stock.html

Web server

HTML

XSL Processor

HTML

58

Why XML is useful

• Data exchange

• Data integration

59

Why XML is useful(cont.)

• Present to different devices

60

XML references

• For XML and related specifications: www.w3c.org• For Java support for XML, like XML parser, XSLT processor:

www.apache.org• For xml technologies: www.xml.com • XML integrated development environment: www.xmlspy.com

top related