Internet Technologies 2
XML Grammars: Three Major Uses
1. Validation
2. Code Generation
3. Communication
Internet Technologies 3
XML Validation
Sources for this lecture:
“Data on the Web” Abiteboul, Buneman and Suciu “XML in a Nutshell” Harold and Means “The XML Companion” Bradley
The validation examples were originally tested with an older parserand so the specific outputs may differ from those shown.
Internet Technologies 4
XML Validation
A batch validating process involves comparing the DTD against a complete document instance and producing a report containing any errors or warnings.
Consider batch validation to be analogous to program compilation, with similar errors detected.
Interactive validation involves constant comparison of the DTDagainst a document as it is being created.
Internet Technologies 5
XML Validation
The benefits of validating documents against a DTD include:
• Programmers can write extraction and manipulation filters without fear of their software ever processing unexpected input.
• Using an XML-aware word processor, authors and editors can be guided and constrained to produce conforming documents. Consider how Netbeans allows you to edit web.xml files.
Internet Technologies 6
XML Validation Examples
XML elements may contain further, embedded elements, andthe entire document must be enclosed by a single documentelement.
These are recursive hierarchical structures.
A Document Type Definition (DTD) contains rules for each element allowed within a specific class of documents.
Internet Technologies 7
Things the DTD does not do:
• Specify the document root.• Specify the number of instances of each kind of element. (Or, it’s rather hard to do.)• Describe the character data inside an element (the precise syntax).•DTD’s don’t naturally handle namespaces.• The XML schema language is much more recent and improves on DTD’s. We have “programmer level” type specifications.• To see a real DTD, view source on http://www.silmaril.ie/software/rss2.dtd
Internet Technologies 8
We’ll run this program against several xml fileswith DTD’s. We’ll study thecode soon.
// Validate.java using Xerces
import java.io.*;
import org.xml.sax.ErrorHandler;import org.xml.sax.SAXException;import org.xml.sax.SAXParseException;import org.xml.sax.XMLReader;import org.xml.sax.InputSource;import org.xml.sax.helpers.XMLReaderFactory;import org.xml.sax.helpers.DefaultHandler;
This slide shows the importedclasses.
Internet Technologies 9
public class Validate { public static boolean valid = true;
public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java Validate filename.xml"); System.exit (1); }
Here we check if the commandline is correct.
Internet Technologies 10
try { // get a parser XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser");
// request validation reader.setFeature("http://xml.org/sax/features/validation", true);
// associate an InputSource object with the file name InputSource inputSource = new InputSource(argv[0]);
// go ahead and parse reader.parse(inputSource); }
Internet Technologies 11
catch(org.xml.sax.SAXException e) { System.out.println("Error in parsing " + e); valid = false; } catch(java.io.IOException e) { System.out.println("Error in I/O " + e); System.exit(0); } System.out.println("Valid Document is " + valid); }}
// Catch any errors or fatal errors here.// The parser will handle simple warnings.
Internet Technologies 12
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"><FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments></FixedFloatSwap>
<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >
XML Document
DTD
Valid document is true
Internet Technologies 13
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "http://localhost:8001/dtd/FixedFloatSwap.dtd"><FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments></FixedFloatSwap>
<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >
XML Document
DTD on the Web?VERY NICE
Valid document is true
Internet Technologies 14
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap [
<!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) > <!ELEMENT Notional (#PCDATA) > <!ELEMENT Fixed_Rate (#PCDATA) > <!ELEMENT NumYears (#PCDATA) > <!ELEMENT NumPayments (#PCDATA) >]>
<FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments></FixedFloatSwap>
XML Document withan internal subset
Valid document is true
Internet Technologies 15
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"><FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments></FixedFloatSwap>
<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >
XML Document
DTD
Valid document is false
Internet Technologies 16
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Swaps SYSTEM "FixedFloatSwap.dtd"><Swaps> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>
<FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap></Swaps>
XML Document
Internet Technologies 17
<?xml version="1.0" encoding="utf-8"?><!ELEMENT Swaps (FixedFloatSwap+) ><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >
DTD
C:\McCarthy\www\examples\sax>java Validate FixedFloatSwap.xml
Quantity Indicators ? 0 or 1 time + 1 or more times * 0 or more times
Valid document is true
Internet Technologies 18
Is this a valid document?
<?xml version="1.0"?><!DOCTYPE person [ <!ELEMENT person (name+, profession*)> <!ELEMENT profession (#PCDATA)> <!ELEMENT name (#PCDATA)>]>
<person> <name>Alan Turing</name> <profession>computer scientist</profession> <profession>cryptographer</profession></person>
Sure!
Internet Technologies 19
The locations where document text data is allowed are indicated by the keyword ‘PCDATA’ (Parsed Character Data).
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd">
<FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears> <StartYear>2000</StartYear> <EndYear>2002</EndYear> </NumYears> <NumPayments>6</NumPayments>
</FixedFloatSwap>
XML Document
Internet Technologies 20
<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >
C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xmlorg.xml.sax.SAXParseException: Element "NumYears" does not allow "StartYear" --(#PCDATA)org.xml.sax.SAXParseException: Element type "StartYear" is not declared.org.xml.sax.SAXParseException: Element "NumYears" does not allow "EndYear" -- (#PCDATA)org.xml.sax.SAXParseException: Element type "EndYear" is not declared.Valid document is false
Output
DTD
Internet Technologies 21
There are strict rules which must be applied when an element is allowed to contain both text and child elements.
The PCDATA keyword must be the first token in the group, and the group must be a choice group (using “|” not “,”).
The group must be optional and repeatable.
This is known as a mixed content model.
Mixed Content
Internet Technologies 22
<?xml version="1.0" encoding="utf-8"?><!ELEMENT Mixed (emph) ><!ELEMENT emph (#PCDATA | sub | super)* ><!ELEMENT sub (#PCDATA)><!ELEMENT super (#PCDATA)>
DTD
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Mixed SYSTEM "Mixed.dtd"><Mixed> <emph>H<sub>2</sub>O is water.</emph></Mixed>
XML Document
Valid document istrue
Internet Technologies 23
Is this a valid document?<?xml version="1.0"?><!DOCTYPE page [ <!ELEMENT page (paragraph+)> <!ELEMENT paragraph ( #PCDATA | profession | bold)*> <!ELEMENT profession (#PCDATA)> <!ELEMENT bold (#PCDATA)>]><page> <paragraph> Alan Turing broke codes during <bold>World War II</bold>. He very precisely defined the notion of "algorithm". And so he had several professions: <profession>computer scientist</profession> <profession>cryptographer</profession> And <profession>mathematician</profession> </paragraph></page>
Sure!
Internet Technologies 24
How about this one?
java Validate mixed.xmlorg.xml.sax.SAXParseException:The content of element type "page" must match "(paragraph)+".Valid document is false
<?xml version="1.0"?><!DOCTYPE page [ <!ELEMENT page (paragraph+)> <!ELEMENT paragraph ( #PCDATA | profession | bold)*> <!ELEMENT profession (#PCDATA)> <!ELEMENT bold (#PCDATA)>]><page> The following is a paragraph marked up in XML. <paragraph> Alan Turing broke codes during <bold>World War II</bold>. He very precisely defined the notion of "algorithm". And so he had several professions: <profession>computer scientist</profession> <profession>cryptographer</profession> And <profession>mathemetician </profession> </paragraph></page>
Internet Technologies 25
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> <Note> <![CDATA[This is text that <b>will not be parsed for markup]]> </Note> </FixedFloatSwap>
<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap ( Notional, Fixed_Rate, NumYears, NumPayments, Note ) ><!ELEMENT Notional (#PCDATA)><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) ><!ELEMENT Note (#PCDATA) >
XML Document
DTD
CDATA Section
Internet Technologies 26
Recursion<?xml version="1.0"?><!DOCTYPE tree [ <!ELEMENT tree (node)> <!ELEMENT node (leaf | (node,node))>
<!ELEMENT leaf (#PCDATA)>]>
<tree> <node> <leaf>A DTD is a context-free grammar</leaf> </node></tree>
java Validate recursive1.xmlValid document is true
Internet Technologies 27
How about this one?<?xml version="1.0"?><!DOCTYPE tree [ <!ELEMENT tree (node)> <!ELEMENT node (leaf | (node,node))>
<!ELEMENT leaf (#PCDATA)>]><tree> <node> <leaf>Alan Turing would like this</leaf> </node> <node> <leaf>Alan Turing would like this</leaf> </node></tree>
java Validate recursive1.xmlorg.xml.sax.SAXParseException:The content of element type"tree" must match "(node)".Valid document is false
Internet Technologies 28
Relational Databases and XML
Consider the relational database r1(a,b,c), r2(c,d)
r1: a b c r2: c d a1 b1 c1 c2 d2 a2 b2 c2 c3 d3 c4 d4
How can we represent this database with an XML DTD?
Internet Technologies 29
Relations<?xml version="1.0"?><!DOCTYPE db [ <!ELEMENT db (r1*, r2*)> <!ELEMENT r1 (a,b,c)> <!ELEMENT r2 (c,d)> <!ELEMENT a (#PCDATA)> <!ELEMENT b (#PCDATA)> <!ELEMENT c (#PCDATA)> <!ELEMENT d (#PCDATA)> ]>
<db> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r2><c> c2 </c> <d> d2 </d> </r2> <r2><c> c3 </c> <d> d3 </d> </r2> <r2><c> c4 </c> <d> d4 </d> </r2></db>
java Validate Db.xmlValid document is true
There is a small problem….
Internet Technologies 30
Relations<?xml version="1.0"?><!DOCTYPE db [ <!ELEMENT db (r1|r2)* > <!ELEMENT r1 ((a,b,c) | (a,c,b) | (b,a,c) | (b,c,a) | (c,a,b) | (c,b,a))> <!ELEMENT r2 ((c,d) | (d,c))> <!ELEMENT a (#PCDATA)> <!ELEMENT b (#PCDATA)> <!ELEMENT c (#PCDATA)> <!ELEMENT d (#PCDATA)> ]><db> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r1><a> a1 </a> <b> b1 </b> <c> c1 </c> </r1> <r2><c> c2 </c> <d> d2 </d> </r2> <r2><c> c3 </c> <d> d3 </d> </r2> <r2><c> c4 </c> <d> d4 </d> </r2></db>
The order of the relationsshould not count and neithershould the order ofcolumns within rows.
Internet Technologies 31
AttributesAn attribute is associated with a particular element by the DTDand is assigned an attribute type.
The attribute type can restrict the range of values it can hold.
Example attribute types include :
CDATA indicates a simple string of characters NMTOKEN indicates a word or token A named token group such as (left | center | right) ID an element id that holds a unique value (among other element ID’s in the document) IDREF attributes refer to an ID
Internet Technologies 32
<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) ><!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED>
DTD
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>
XML Document
C:\McCarthy\www\46-928\examples\sax>java Validate FixedFloatSwap.xmlorg.xml.sax.SAXParseException: Attribute value for "currency" is #REQUIRED.
Valid document is false
Internet Technologies 33
<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) ><!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED>
DTD
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional currency = “Pounds”>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>
XML Document
Valid document is true
Internet Technologies 34
DTD
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional currency = “Pounds”>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>
XML Document
Valid document is true#IMPLIED means optional
<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) ><!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED><!ATTLIST FixedFloatSwap note CDATA #IMPLIED>
Internet Technologies 35
DTD
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap note = “For your eyes only”> <Notional currency = “Pounds”>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>
XML Document
Valid document is true
<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) ><!ATTLIST Notional currency (Dollars | Pounds) #REQUIRED><!ATTLIST FixedFloatSwap note CDATA #IMPLIED>
Internet Technologies 36
ID and IDREF Attributes
We can represent complex relationships within an XML document using ID and IDREF attributes.
Internet Technologies 39
Math 100 Geom100
Calc100 Calc200 Calc300
Philo45CS1 CS2
This is called a DAG (Directed Acyclic Graph)
Internet Technologies 40
<?xml version="1.0"?>
<!DOCTYPE Course_Descriptions SYSTEM "course_descriptions.dtd">
<Course_Descriptions>
<Course>
<Course-ID id = "Math100" />
<Title>Algebra I</Title>
<Description> Students in this course study
introductory algebra.
</Description>
<Prerequisites/>
</Course>
This course has an ID
But no prerequisites
Internet Technologies 41
<Course>
<Course-ID id = "Geom100" />
<Title>Geometry I</Title>
<Description> Students in this course study how to
prove several theorems in geometry.
</Description>
<Prerequisites/>
</Course>
The DTD will forcethis to be unique.
Internet Technologies 42
<Course>
<Course-ID id="Calc100" />
<Title>Calculus I</Title>
<Description> Students in this course study the derivative.
</Description>
<Prerequisites pre="Math100 Geom100" />
</Course>
<Course>
These are references toID’s. (IDREFS)
Internet Technologies 43
<Course-ID id = "Calc200" />
<Title>Calculus II</Title>
<Description> Students in this course study the integral.
</Description>
<Prerequisites pre="Calc100" />
</Course>
The DTD requires that this namebe a unique id defined within thisdocument. Otherwise, the documentis invalid.
Internet Technologies 44
<Course>
<Course-ID id = "Calc300" />
<Title>Calculus II</Title>
<Description> Students in this course study the derivative
and the integral (in 3-space).
</Description>
<Prerequisites pre="Calc200" />
</Course>
Prerequisites is an EMPTYelement. It’s used only for itsattributes.
Internet Technologies 45
<Course>
<Course-ID id = "CS1" />
<Title>Introduction to Computer Science I</Title>
<Description> In this course we study Turing machines.
</Description>
<Prerequisites pre="Calc100" />
</Course>
<Course>
IDREF ID
A One-to-one link
Internet Technologies 46
<Course-ID id = "CS2" />
<Title>Introduction to Computer Science II</Title>
<Description> In this course we study basic data structures.
</Description>
<Prerequisites pre="Calc200 CS1"/>
</Course>
<Course>
IDREFS
ID
ID
One-to-many links
Internet Technologies 47
<Course-ID id = "Philo45" />
<Title>Ethical Implications of Information Technology</Title>
<Description> TBA
</Description>
<Prerequisites/>
</Course>
</Course_Descriptions>
Internet Technologies 48
<?xml version="1.0"?>
<!-- Course Description DTD --> <!ELEMENT Course_Descriptions (Course)+> <!ELEMENT Course (Course-ID,Title,Description,Prerequisites)> <!ELEMENT Course-ID EMPTY> <!ELEMENT Title (#PCDATA)> <!ELEMENT Description (#PCDATA)> <!ELEMENT Prerequisites EMPTY>
<!ATTLIST Course-ID id ID #REQUIRED>
<!ATTLIST Prerequisites pre IDREFS #IMPLIED>
The Course_Descriptions.dtd
Internet Technologies 49
General Entities &
General entities are used to place text into the XML document.
They may be declared in the DTD and referenced in the document.
They may also be declared in the DTD as residing in a file. Theymay then be referenced in the document.
Internet Technologies 50
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [ <!ENTITY bankname "Mellon National Bank and Trust" > ]> <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>
<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Bank,Notional, Fixed_Rate, NumYears, NumPayments ) ><!ELEMENT Bank (#PCDATA) ><!ELEMENT Notional (#PCDATA) ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >
DTD
Document usinga General Entity
Validate is true
Internet Technologies 51
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match = "Bank"> <WML> <CARD> <xsl:apply-templates/> </CARD> </WML> </xsl:template>
<xsl:template match = "Notional | Fixed_Rate | NumYears | NumPayments"> </xsl:template> </xsl:stylesheet>
XSLT Program
The general entity is replaced before xslt sees it.
Internet Technologies 52
C:\McCarthy\www\46-928\examples\sax>java -Dcom.jclark.xsl.sax.parser=com.jclark.xml.sax.CommentDriver com.jclark.xsl.sax.Driver FixedFloatSwap.xml FixedFloatSwap.xsl FixedFloatSwap.wml
C:\McCarthy\www\46-928\examples\sax>type FixedFloatSwap.wml
<?xml version="1.0" encoding="utf-8"?>
<WML><CARD>Mellon National Bank and Trust</CARD></WML>
XSLT OUTPUT
Internet Technologies 53
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd" [
<!ENTITY bankname SYSTEM "JustAFile.dat" >
]> <FixedFloatSwap> <Bank>&bankname;</Bank> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>
An external text entity
Internet Technologies 54
Mellon Bank And Trust CorporationPittsburgh PA
XSLT Output
<?xml version="1.0" encoding="utf-8"?>
<WML><CARD>Mellon Bank And Trust CorporationPittsburgh PA</CARD></WML>
JustAFile.dat
Internet Technologies 55
Parameter Entities %
While general entities are used to place text into the XML documentparameter entities are used to modify the DTD.
We want to build modular DTD’s so that we can create new DTD’susing existing ones.
We’ll look at slide from www.fpml.org and the see some examples.
Internet Technologies 56
FpML is a Complete Description of the Trade
Pool of modular componentsgrouped into separate namespaces
Date ScheduleProduct
Rate
Adjustable PeriodNotional
Party
Trade
Trade ID
Product
Rate
Adjustable Period
Notional
Party
Vanilla SwapVanilla Fixed Float SwapCancellableSwaptionFX SpotFX OutrightFX SwapForward Rate Agreement...
MoneyDate
Internet Technologies 57
<?xml version="1.0" encoding="utf-8"?><!ELEMENT FixedFloatSwap (Notional, Fixed_Rate, NumYears, NumPayments ) ><!ENTITY % parsedCharacterData "(#PCDATA)"><!ELEMENT Notional %parsedCharacterData; ><!ELEMENT Fixed_Rate (#PCDATA) ><!ELEMENT NumYears (#PCDATA) ><!ELEMENT NumPayments (#PCDATA) >
XML Document
DTD
Internal Parameter Entities
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE FixedFloatSwap SYSTEM "FixedFloatSwap.dtd"> <FixedFloatSwap> <Notional>100</Notional> <Fixed_Rate>5</Fixed_Rate> <NumYears>3</NumYears> <NumPayments>6</NumPayments> </FixedFloatSwap>
Internet Technologies 58
External Parameter Entities and DTD Components
<?xml version="1.0" encoding = "UTF-8"?><!DOCTYPE ORDER SYSTEM "order.dtd"><!-- example order form from “XML A Manager’s Guide” --><ORDER SOURCE ="web" CUSTOMERTYPE="consumer" CURRENCY="USD"> <addresses> <address ADDTYPE="billship"> <firstname>Kevin</firstname> <lastname>Dick</lastname> <street ORDER="1">123 Anywhere Lane</street> <street ORDER="2">Apt 1b</street> <city>Palo Alto</city> <state>CA</state> <postal>94303</postal> <country>USA</country> </address>
Order.xml
Internet Technologies 59
<address ADDTYPE="bill"> <firstname>Kevin</firstname> <lastname>Dick</lastname> <street ORDER="1">123 Not The Same Lane</street> <street ORDER="2">Work Place</street> <city>Palo Alto</city> <state>CA</state> <postal>94300</postal> <country>USA</country> </address> </addresses>
An order may have more than oneaddress.
Internet Technologies 60
<lineitems> <lineitem ID="line1"> <product CAT="MBoard">440BX Motherboard</product> <quantity>1</quantity> <unitprice>200</unitprice> </lineitem> <lineitem ID="line2"> <product CAT = "RAM">128 MB PC-100 DIMM</product> <quantity>2</quantity> <unitprice>175</unitprice> </lineitem> <lineitem ID="line3"> <product CAT="CDROM">40x CD-ROM</product> <quantity>1</quantity> <unitprice>50</unitprice> </lineitem> </lineitems>
Several productsmay be purchased.
Internet Technologies 61
<payment> <card CARDTYPE="VISA"> <cardholder>Kevin S. Dick</cardholder> <cardnumber>11111-22222-33333</cardnumber> <expiration>01/01</expiration> </card> </payment></ORDER>
The payment is witha Visa card.
We want this document to be validated.
Internet Technologies 62
order.dtd<?xml version="1.0" encoding="UTF-8"?>
<!-- Example Order form DTD adapted from XML: A Manager's Guide -->
<!-- Define an ORDER element -->
<!ELEMENT ORDER (addresses, lineitems, payment)> <!ATTLIST ORDER SOURCE (web | phone | retail) #REQUIRED CUSTOMERTYPE (consumer | business) "consumer" CURRENCY CDATA "USD">
Define an order based on other elements.
Internet Technologies 63
<!ENTITY % anAddress SYSTEM "address.dtd" >%anAddress;
<!-- Collection of Addresses --><!ELEMENT addresses (address+)>
<!ENTITY % aLineItem SYSTEM "lineitem.dtd" >%aLineItem;
<!-- Collection of LineItems --><!ELEMENT lineitems (lineitem+)>
<!ENTITY % aPayment SYSTEM "payment.dtd" >%aPayment;
External parameterentity declaration %
External parameter entity reference %
Internet Technologies 64
address.dtd<!-- Address Structure --><!ELEMENT address (firstname, middlename?, lastname, street+, city, state,postal,country)>
<!ELEMENT firstname (#PCDATA)><!ELEMENT middlename (#PCDATA)><!ELEMENT lastname (#PCDATA)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT state (#PCDATA)><!ELEMENT postal (#PCDATA)><!ELEMENT country (#PCDATA)><!ATTLIST address ADDTYPE (bill | ship | billship) "billship"><!ATTLIST street ORDER CDATA #IMPLIED>
Internet Technologies 65
lineitem.dtd<!ELEMENT lineitem (product,quantity,unitprice)><!ATTLIST lineitem ID ID #REQUIRED>
<!ELEMENT product (#PCDATA)><!ATTLIST product CAT (CDROM|MBoard|RAM) #REQUIRED>
<!ELEMENT quantity (#PCDATA)><!ELEMENT unitprice (#PCDATA)>
Internet Technologies 66
<!ELEMENT payment (card | PO)><!ELEMENT card (cardholder, cardnumber, expiration)><!ELEMENT cardholder (#PCDATA)><!ELEMENT cardnumber (#PCDATA)><!ELEMENT expiration (#PCDATA)><!ELEMENT PO (number,authorization*)><!ELEMENT number (#PCDATA)><!ELEMENT authorization (#PCDATA)>
<!ATTLIST card CARDTYPE (VISA|MasterCard|Amex) #REQUIRED>
payment.dtd
Internet Technologies 67
XML Schemas Improve on DTD’s
• XML Schema is the official name• XSDL (XML Schema Definition Language) is the language used to create schema definitions• XML Syntax• Can be used to more tightly constrain a document instance• Supports namespaces• Permits type derivation• Harder than DTD’s
Internet Technologies 68
Other Grammars Include
• RELAX • TREX (James Clark - Tree Regular Expressions
for XML)• RELAX NG (RELAX and TREX combined to
Relax Next Generation)• Schematron (“Rule based” rather than “grammar
based” see www.ascc.net/xml/schematron) Based on XSLT and XPath
Internet Technologies 69
XSDL - A Simple Purchase Order
<?xml version="1.0" encoding="UTF-8"?> <!-- po.xml -->
<purchaseOrder orderDate="07.23.2001" xmlns="http://www.cds-r-us.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.cds-r-us.com po.xsd">
Internet Technologies 70
<recipient country="USA"> <name>Dennis Scannel</name> <street>175 Perry Lea Side Road</street> <city>Waterbury</city> <state>VT</state> <postalCode>15216</postalCode> </recipient>
<order> <cd artist="Brooks Williams" title="Little Lion" /> <cd artist="David Wilcox" title="What you whispered" /> </order>
</purchaseOrder>
Internet Technologies 71
Purchase Order XSDL
<?xml version="1.0" encoding="utf-8"?> <!-- po.xsd --><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.cds-r-us.com" targetNamespace="http://www.cds-r-us.com" >
Internet Technologies 72
<xs:element name="purchaseOrder">
<xs:complexType> <xs:sequence> <xs:element ref="recipient" /> <xs:element ref="order" /> </xs:sequence> <xs:attribute name="orderDate" type="xs:string" /> </xs:complexType>
</xs:element>
Internet Technologies 73
<xs:element name = "recipient">
<xs:complexType> <xs:sequence> <xs:element ref="name" /> <xs:element ref="street" /> <xs:element ref="city" /> <xs:element ref="state" /> <xs:element ref="postalCode" /> </xs:sequence> <xs:attribute name="country" type="xs:string" /> </xs:complexType>
</xs:element>
Internet Technologies 74
<xs:element name = "name" type="xs:string" /> <xs:element name = "street" type="xs:string" /> <xs:element name = "city" type="xs:string" /> <xs:element name = "state" type="xs:string" /> <xs:element name = "postalCode" type="xs:short" />
<xs:element name = "order"> <xs:complexType> <xs:sequence> <xs:element ref="cd" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>
Internet Technologies 75
<xs:element name="cd"> <xs:complexType> <xs:attribute name="artist" type="xs:string" /> <xs:attribute name="title" type="xs:string" /> </xs:complexType> </xs:element>
</xs:schema>
Internet Technologies 76
Validate.java// Validate.java using Xerces
import java.io.*;
import org.xml.sax.ErrorHandler;import org.xml.sax.SAXException;import org.xml.sax.SAXParseException;import org.xml.sax.XMLReader;import org.xml.sax.InputSource;import org.xml.sax.helpers.XMLReaderFactory;import org.xml.sax.helpers.DefaultHandler;import java.io.*;
Internet Technologies 77
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.SAXException;
import org.xml.sax.InputSource;
import org.xml.sax.SAXParseException;
Internet Technologies 78
public class Validate extends DefaultHandler { public static boolean valid = true;
public void error(SAXParseException exception) { System.out.println("Received notification of a recoverable error." + exception); valid = false; }
public void fatalError(SAXParseException exception) { System.out.println("Received notification of a non-recoverable error."+
exception); valid = false; } public void warning(SAXParseException exception) { System.out.println("Received notification of a warning."+ exception); }
Internet Technologies 79
public static void main (String argv []) { if (argv.length != 1) { System.err.println ("Usage: java Validate filename.xml"); System.exit (1); } try { // get a parser XMLReader reader = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser"); // request validation reader.setFeature("http://xml.org/sax/features/validation",true); reader.setFeature( "http://apache.org/xml/features/validation/schema",true); reader.setErrorHandler(new Validate()); // associate an InputSource object with the file name InputSource inputSource = new InputSource(argv[0]);
// go ahead and parse reader.parse(inputSource);
Internet Technologies 80
} catch(org.xml.sax.SAXException e) { System.out.println("Error in parsing " + e); valid = false; } catch(java.io.IOException e) { System.out.println("Error in I/O " + e); System.exit(0); } System.out.println("Valid Document is " + valid); }}
Internet Technologies 81
XML Document<?xml version="1.0" encoding="utf-8"?><itemList xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:noNamespaceSchemaLocation="itemList.xsd"> <item> <name>pen</name> <quantity>5</quantity> </item> <item> <name>eraser</name> <quantity>7</quantity> </item> <item> <name>stapler</name> <quantity>2</quantity> </item></itemList>
Internet Technologies 82
XSDL Grammar itemList.xsd
<?xml version="1.0" encoding="utf-8"?><xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
<xsd:element name="itemList"> <xsd:complexType> <xsd:sequence> <xsd:element ref="item" minOccurs="0" maxOccurs="3"/> </xsd:sequence> </xsd:complexType> </xsd:element>
Internet Technologies 83
<xsd:element name="item">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="name"/>
<xsd:element ref="quantity"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="quantity" type="xsd:short"/>
</xsd:schema>
Internet Technologies 84
D:..95-733\examples\XSDL\testing>ant run
Buildfile: build.xml
run:
Running Validate.java on itemList-xsd.xml
Valid Document is true
Internet Technologies 85
Another Example
<?xml version="1.0" encoding="UTF-8"?> <!-- po.xml --><myns:purchaseOrder orderDate="07.23.2001" xmlns:myns="http://www.cds-r-us.com" xmlns:xsi= "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://www.cds-r-us.com po.xsd">
Internet Technologies 86
<myns:recipient country="USA">
<myns:name>Dennis Scannel</myns:name>
<myns:street>175 Perry Lea Side Road</myns:street>
<myns:city>Waterbury</myns:city>
<myns:state>VT</myns:state>
<myns:postalCode>05675A</myns:postalCode>
</myns:recipient>
Note that there is a problem with this document.
Internet Technologies 87
<myns:order>
<myns:cd artist="Brooks Williams" title="Little Lion" />
<myns:cd artist="David Wilcox" title="What you whispered" />
</myns:order>
</myns:purchaseOrder>
Internet Technologies 88
XSDL Grammar po.xsd
<?xml version="1.0" encoding="utf-8"?> <!-- po.xsd --><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.cds-r-us.com" targetNamespace="http://www.cds-r-us.com" > <xs:element name="purchaseOrder"> <xs:complexType> <xs:sequence> <xs:element ref="recipient" /> <xs:element ref="order" /> </xs:sequence> <xs:attribute name="orderDate" type="xs:string" /> </xs:complexType> </xs:element>
Internet Technologies 89
<xs:element name = "recipient"> <xs:complexType> <xs:sequence> <xs:element ref="name" /> <xs:element ref="street" /> <xs:element ref="city" /> <xs:element ref="state" /> <xs:element ref="postalCode" /> </xs:sequence> <xs:attribute name="country" type="xs:string" /> </xs:complexType> </xs:element>
Internet Technologies 90
<xs:element name = "name" type="xs:string" /> <xs:element name = "street" type="xs:string" /> <xs:element name = "city" type="xs:string" /> <xs:element name = "state" type="xs:string" /> <xs:element name = "postalCode" type="xs:short" />
<xs:element name = "order"> <xs:complexType> <xs:sequence> <xs:element ref="cd" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element>
Internet Technologies 91
<xs:element name="cd">
<xs:complexType>
<xs:attribute name="artist"
type="xs:string" />
<xs:attribute name="title" type="xs:string" />
</xs:complexType>
</xs:element>
</xs:schema>
Internet Technologies 92
Running Validate
D:..\examples\XSDL\testing>ant runBuildfile: build.xml
run: Running Validate.java on po.xml Received notification of a recoverable
error.org.xml.sax.SAXParseException: cvc-datatype-valid.1.2.1: '05675A' is not a valid 'integer' value.
Received notification of a recoverable
error.org.xml.sax.SAXParseException: cvc-type.3.1.3: The value '05675A' of element 'myns:postalCode' is not valid.
Valid Document is false
Internet Technologies 93
Fix the error and run again
D:\..\XSDL\testing>ant run
Buildfile: build.xml
run:
Running Validate.java on po.xml
Valid Document is true
Internet Technologies 94
Introduce a Namespace Error
<?xml version="1.0" encoding="UTF-8"?>
<!-- po.xml -->
<myns:purchaseOrder orderDate="07.23.2001"
xmlns:myns="http://www.cds-r-us.edu"
xmlns:xsi=
"http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.cds-r-us.com
po.xsd"
>
Internet Technologies 95
<myns:recipient country="USA"> <myns:name>Dennis Scannel</myns:name> <myns:street> 175 Perry Lea Side Road </myns:street> <myns:city>Waterbury</myns:city> <myns:state>VT</myns:state> <myns:postalCode>05675</myns:postalCode> </myns:recipient>
Internet Technologies 96
<myns:order>
<myns:cd artist="Brooks Williams" title="Little Lion" />
<myns:cd artist="David Wilcox" title="What you whispered" />
</myns:order>
</myns:purchaseOrder>
Internet Technologies 97
And run validate
run: Running Validate.java on po.xml
Received notification of a recoverable
error.org.xml.sax.SAXParseException: cvc-elt.1:
Cannot find the declaration of element 'myns:purchaseOrder'.
Valid Document is false
Internet Technologies 98
Code Generation
• Run JAXB against the .xsd file
• Code generated will present an API allowing us to process that style of
document
Internet Technologies 99
itemList.xsd again
<?xml version="1.0" encoding="utf-8"?><xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
<xsd:element name="itemList"> <xsd:complexType> <xsd:sequence> <xsd:element ref="item" minOccurs="0" maxOccurs="3"/> </xsd:sequence> </xsd:complexType> </xsd:element>
Internet Technologies 100
<xsd:element name="item"> <xsd:complexType> <xsd:sequence> <xsd:element ref="name"/> <xsd:element ref="quantity"/> </xsd:sequence> </xsd:complexType></xsd:element>
<xsd:element name="name" type="xsd:string"/> <xsd:element name="quantity" type="xsd:short"/></xsd:schema>
Internet Technologies 101
Run xjc
D:..XSDL\testing>xjc itemList.xsd
D:\McCarthy\www\95-733\examples\XSDL\testing>java -jar D:\jwsdp-1.1\jaxb-1.0\lib
\jaxb-xjc.jar itemList.xsd
parsing a schema...compiling a schema...generated\impl\ItemImpl.javagenerated\impl\ItemListImpl.javagenerated\impl\ItemListTypeImpl.javagenerated\impl\ItemTypeImpl.javagenerated\impl\NameImpl.java
Internet Technologies 102
generated\impl\QuantityImpl.javagenerated\Item.javagenerated\ItemList.javagenerated\ItemListType.javagenerated\ItemType.javagenerated\Name.javagenerated\ObjectFactory.javagenerated\Quantity.javagenerated\bgm.sergenerated\jaxb.properties
Write Java Code That uses NEW the api
Internet Technologies 103
The build script used for these examples
<?xml version="1.0"?>
<project basedir="." default="compile"> <path id="classpath"> <fileset dir="D:/jwsdp-1.1/saaj-1.1.1/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jaxb-1.0/lib" includes="*.jar"/> <fileset dir="d:/jwsdp-1.1/common/lib" includes="*.jar"/>
Internet Technologies 104
<fileset dir="D:/jwsdp-1.1/jaxm-1.1.1/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/bin" includes="*.jar" /> <fileset dir="D:/jwsdp-1.1/jaxp-1.2.2/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jaxp-1.2.2/lib/endorsed" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jwsdp-shared/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jaxr-1.0_03/lib" includes="*.jar"/> <fileset dir="D:/jwsdp-1.1/jakarta-ant-1.5.1/lib" includes="*.jar"/> <fileset dir="D:/j2sdk1.4.1_01/lib" includes="*.jar"/>
<pathelement location="."/> </path>
Internet Technologies 105
<!-- compile Java source files --> <target name="compile"> <!-- compile all of the java sources --> <echo message="Compiling the java source files..."/> <javac srcdir="." destdir="." debug="on"> <classpath refid="classpath" /> </javac> </target>
<target name="run"> <echo message="Running Validate.java on po.xml"/> <java classname="Validate" fork="fasle"> <arg value="po.xml"/> <classpath refid="classpath" /> </java> </target></project>