4d2b - fundamentals of xml 4d2b... · 2014. 1. 24. · document • an xml parser is not required...

67
University of Dublin Trinity College 4D2b - Fundamentals of XML [email protected]

Upload: others

Post on 05-Oct-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

University of Dublin Trinity College

4D2b - Fundamentals of XML

[email protected]

Page 2: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

What is Markup

•  <Lecture> •  Sequence of characters within a text or word

processing file to define – Print properties – Display properties – Document's logical structure

•  Markup indicators are often called "tags" – Examples

•  RTF •  EDIFACT •  XML

<>

</> \

{}

'

" :

+

Page 3: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Mark Up: RTF

\li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel2\adjustright\rin0\lin0\itap0 \b\f1\fs26\lang2057\langfe1033\cgrid\langnp2057\langfenp1033 {\lang6153\langfe1033\langnp6153 Entity Relationship Diagram \par }\pard\plain \s1\ql \li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\adjustright\rin0\lin0\itap0 \cbpat17 \b\f1\fs24\lang2057\langfe1033\kerning32\cgrid\langnp2057\langfenp1033 {\lang6153\langfe1033\langnp6153 Entity Type \par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang2057\langfe1033\cgrid\langnp2057\langfenp1033 {\b\fs20\ul\lang6153\langfe1033\langnp6153 Def.:}{\b\fs20\lang6153\langfe1033\langnp6153 }{ \fs20\lang6153\langfe1033\langnp6153 An object or concept that is identified by the enterprise as having an independent existence. \par }\pard\plain \s1\ql \li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\adjustright\rin0\lin0\itap0 \cbpat17 \b\f1\fs24\lang2057\langfe1033\kerning32\cgrid\langnp2057\langfenp1033 {\lang6153\langfe1033\langnp6153 Entity \par }\pard\plain \ql

Page 4: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Mark Up: EDIFACT

'''ED2'''OPENET:1111111:OVT':003705655815:OVT'ABC1234567'0'TYP:ORDERS'NRQ:1''' UNA:+.? 'UNB+UNOC:2+003705655815:30+1111111:30+980729:2233+4++ORDERS911+++KKK KATE+1'UNH+ 1+ORDERS:001:911:UN:FI0030'BGM+640+1234567'DTM+4:19981201:102'DTM+2:19990101:102'DTM+2:9901:616'RFF+BC:123'RFF+VN:123456'NAD+BY+003705655815:100' NAD+SE+11111111::92'NAD+PL+53432::92++KAUPPA:KAUPUNKI+KATU 9+KAUPUNKI++00007'NAD+CN+-::ZZ++TERMINAALI+OVI 42+TOINEN KAUPUNKI++00069'UNS+D'LIN+1++23442423234 :EN'PIA+5+3244:MF'PIA+5+2341234324:ZBU'PIA+5+234243:ZCG'IMD+F+8+-::91:KUKKAPUR KKI:SAVI'QTY+21:8:KPL'FTX+AAA+++T.HARMAA:V[RI'FTX+AAA+++10:KOKO'PRI+NTP:7.23:+ RP:7.32:PE'TAX+7+VAT+++:::22.00'LIN+2++543434554345:EN'PIA+5+535:MF'PIA+5+45: PCE‘UNT+38+2'UNZ+2+4' '''EOF'''9'

Page 5: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Mark Up: XML

<fragment> <section> <title>Introduction</title> <para>Since the emergence of <acronym refid="xml">XML</acronym> in early 1998 and it's subsequent adoption across diverse application domains, one of the key benefits it enabled was the separation of content and presentation <bibref refloc="Bos97"/>. <acronym refid="xml">XML</acronym> borrowed this model (along with other important concepts) from the <acronym.grp><acronym refid="sgml">SGML</acronym><expansion id="sgml">Standard Generalised Markup Language</expansion></acronym.grp>. An <acronym refid="sgml">SGML</acronym> document consists of logically structured content and uses a separate file (style sheet) to specify how the content should be formatted for [...] <figure id="img1"> <title>ePublishing Components</title> <graphic href="02-04-03-fig01.jpg" width="321" height="214"/> </figure> </section> </fragment>

Page 6: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

What is SGML? •  Standard Generalised Mark-

Up Language •  ISO standard since 1986 •  Meta-language for defining

document mark-up vocabularies

•  Uses logical mark-up (structure, content) instead physical (how document looks on printed page)

•  Platform-, system-, vendor- and version-independent documents

•  Very powerful, but contains a number of complex features

<!DOCTYPE anthology [ <!ELEMENT anthology - - (poem+) > <!ELEMENT poem - - (title?, stanza+)> <!ELEMENT title - O (#PCDATA) > <!ELEMENT stanza - O (line+) > <!ELEMENT line O O (#PCDATA) > ]]>

<anthology> <poem> <title> The SICK ROSE </title> <stanza> <line>O Rose thou art sick.</line> <line>The invisible worm,</line> [...] </stanza> <stanza> <line>Has found out thy bed</line> <line>Of crimson joy:</line> [...] </stanza> </poem> </anthology>

Page 7: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

What is HTML? •  HTML, the de facto standard

for publishing Web content, is an SGML vocabulary

•  Supporting full SGML on the Web was too difficult so HTML made some simplifications –  not extensible –  limited structure –  not content oriented –  cannot be validated

•  HTML is a simple language to understand and use

•  Most of the content available on the Web has been created with HTML

<html> <head> <title>The SICK ROSE</title> </head> <body> <h1>The SICK ROSE</h1> <p> O Rose thou art sick.<br /> The invisible worm,<br /> [...] </p> <p> Has found out thy bed<br /> Of crimson joy:<br /> [...] </p> </body>

</html>

Page 8: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

What is XML? •  eXtensible Markup Language •  XML is a simplified subset of SGML •  Can also be used to define document markup

vocabularies (e.g. XHTML) –  These can have a strictly defined structure (DTD)

•  Retains the powerful features of SGML (extensibility, structure, validation)

•  Ignores the complex features of SGML and is therefore easier to use and implement

•  XML documents look similar to HTML documents •  Separates structure and presentation (like SGML)

Page 9: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Design of XML

• The design goals for XML as set out in the 1.0 specification are as follows: 1. XML shall be straightforwardly usable over the

Internet. 2. XML shall support a wide variety of applications. 3. XML shall be compatible with SGML. 4. It shall be easy to write programs which process

XML documents. 5. The number of optional features in XML is to be

kept to the absolute minimum, ideally zero.

Page 10: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Design of XML 6.  XML documents should be human-legible and

reasonably clear. 7.  The XML design should be prepared quickly. 8.  The design of XML shall be formal and concise. 9.  XML documents shall be easy to create. 10. Terseness in XML markup is of minimal

importance.

Page 11: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

XML Example <?xml version='1.0' encoding='ISO-8859-1' standalone='yes' ?> <doc type="book" isbn="1-56592-796-9" xml:lang="en"> <title>A Guide to XML</title> <author>Norman Walsh</author> <chapter> <title>What Do XML Documents Look Like?</title> <paragraph>If you are [...]</paragraph> <ol> <item> <paragraph>The document begins [...]</paragraph> </item> <item> <paragraph>Empty elements have [...]</paragraph> <paragraph>In a very [...]</paragraph> </item> </ol> <section>[...]</section> [...] </chapter> <chapter>[...]</chapter> </doc>

Page 12: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Vocab

ularies

SGML

HTML

XHTML

Meta Lan

gu

ages

XSL

SVG

SMIL

HL7 v3 XTM

CEN TC251

ASTM 31.25 SynExML

XML

Presentation Vocabularies Electronic Patient Record Vocabularies

Meta Language vs. Vocabulary

Page 13: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Why is the emergence of XML an important development? •  XML is a tool for defining languages

– XML languages are easy to read – XML is self describing

•  Parse tree embedded in document • Grammar for language referenced via DTD/Schema

•  XML languages are easy for computers to process, exchange and display – XML tools are ubiquitous, free and conform to

established standards – Natural affinity with Object serialization – Data source neutral

Page 14: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

XML technologies

CSS, Cascading Style Sheets XSL, Extensible Stylesheet Language XPath, XQuery

XLink, XBase XPointer

Topics Maps, Ontology Web Language RDF, Resource Description Framework

XML Schema, RelaxNG, RDF Schema, Document Type Definition (DTD)

XML Namespaces XML 1.0

Structure

Semantics

Linking

Presentation

Syntax

Page 15: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

XML 1.0

•  The XML 1.0 specification describes the syntax for XML documents (elements and attributes) and DTDs

•  An XML document is a hierarchical data structure using self-definable tags – e.g. <doc><author>[..]</author></doc>

•  There are many other technologies related to XML

•  XML is A simple common layer for tree structures in a character stream.

Page 16: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Design goals of XML 1.0 specification 1.  XML shall be straightforwardly usable over the

Internet. 2.  XML shall support a wide variety of applications. 3.  XML shall be compatible with SGML. 4.  It shall be easy to write programs which process XML

documents. 5.  The number of optional features in XML is to be kept

to the absolute minimum, ideally zero. 6.  XML documents should be human-legible and

reasonably clear. 7.  The XML design should be prepared quickly. 8.  The design of XML shall be formal and concise. 9.  XML documents shall be easy to create. 10. Terseness in XML markup is of minimal importance.

Page 17: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Physical Parts of XML documents

Page 18: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Physical parts of XML documents

•  XML Declaration •  Elements •  Attributes •  Document Type Declaration •  Entities •  Processing Instructions •  Comments •  Character Data Sections

•  XML Namespaces

Page 19: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

XML Declaration •  Placed at the start of an

XML document •  Informs XML software of

–  the version of XML the document conforms to

–  the character encoding scheme used in the document

–  whether or not a set of external declarations affect the interpretation of this document

<?xml version="1.0" ?> <?xml

version="1.0“ encoding="UTF-8" ?>

<?xml

version="1.0“ encoding="UTF-8" standalone="yes" ?>

Page 20: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Elements •  Define logical structure and

sections of XML documents •  Four different content types:

–  Data content –  Element content –  Mixed content –  Empty.

•  Each element must be completely enclosed by another element, except for the root

•  Note –  Any XML name must start

with a letter, underscore but after that can include also digits, fullstops, hyphens. Don’t start with colon due to namespaces Don’t include spaces

<?xml version="1.0" ?> <doc>

<title>Java Gently</title> <author>Judy Bishop</author> <publisher name=‘HH’ />

<chapter> <thetext> this is <bold>

bold </bold> text </thetext> <paragraph/> </chapter> </doc>

Page 21: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Attributes •  Provides additional

information about an element

•  Attributes are contained within the start-tag

•  Consists of a name and associated value separated by an equals sign

•  The attribute value must always be enclosed by quotes

•  The order of attributes is insignificant

<?xml version="1.0" ?> <doc type="book" isbn="0-201-71050-1"> <title>Java Gently</title> <author>Judy Bishop</author> <chapter> <paragraph type="abstract"> In this book ... </paragraph> </chapter> </doc>

Page 22: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

ELEMENT vs. ATTRIBUTE

ELEMENT •  Constituent data, •  Used for content, •  White space can be

ignored or preserved •  Nesting allowed (child

elements), •  Convenient for large

values, or binary entities.

ATTRIBUTE •  Inherent data, •  Used for meta-data, •  No further nesting

possible (atomic data), •  Default values, •  Minimal datatypes,

•  Lexically little difference, •  application specific, •  no hard/fast rules available.

Page 23: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Entities •  Storage units for

repeated text –  Defined in a DTD

•  Character entities are used to insert characters that cannot be typed directly

•  XML contains a number of 'built-in' entities –  &quot; –  &apos; –  &lt; –  &gt; –  &amp;

<math> 5 &lt; 6 and 6 &gt; 5 </math>

<copyright> &copyright-notice; </copyright> <bullet> XML contains a number

of &apos;built-in&apos; entities <list> <item>&amp;quot;</item> <item>&amp;apos;</item>

<item>&amp;lt;</item> <item>&amp;gt;</item> <item>&amp;amp;</item>

</list> </bullet>

Page 24: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Character Data Sections •  Data which is to be

parsed is called PCDATA •  An XML parser will not

treat the contents of a CDATA section as markup –  Used to simplify mark-up

by escaping a selection of text

•  Entity references are not resolved

•  Useful for including source code in XML

<![CDATA[ You don't need to escape special characters in CDATA sections, such as <, >, &, , ' and ". ]]> <![CDATA[<<< STOP now >>>]]>

<![CDATA[<?xml version='1.0'?> <person> <name>Mike</name> <age>24</age> </person>]]>

Page 25: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Processing Instructions •  Pass additional

information to application (e.g. parser)

•  Application-specific instructions

•  Consists of a PI Target and PI Value

•  Processed by applications that recognise the PI Target

<?xml-stylesheet type='text/css' href='style.css'?>

<?xml-stylesheet type='text/

xsl' href='style.xsl'?> <?myapp filename='test.txt'?>

Page 26: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Comments •  Used to comment XML

documents •  Not considered to be

part of an XML document

•  An XML parser is not required to pass comments to higher-level applications

<!–- one-line comment --> <!-- This is a multi-line comment -->

Page 27: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Well formed XML •  XML Declaration required •  At least one element

–  Exactly one root element

•  Empty elements are written in one of two ways: –  Closing tag (e.g. "<br></br>") –  Special start tag (e.g. "<br />")

•  For non-empty elements, closing tags are required •  Start tag must match closing tag (name & case) •  Correct nesting of elements •  Attribute values must always be quoted •  Attribute minimisation not allowed

Page 28: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Document Type Declaration •  Internal/embedded DTD

•  External DTD

<?xml version='1.0' standalone='yes'>

<!DOCTYPE person [ <!ELEMENT person (name,

adult, nationality)> … ]>

<?xml version='1.0'> <!DOCTYPE person SYSTEM

'person.dtd'>

Page 29: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

What are XML Namespaces? •  W3C recommendation (January 1999) •  Each XML vocabulary is considered to own a

namespace in which all elements (and attributes) are unique

•  A single document can use elements and attributes from multiple namespaces –  A prefix is declared for each namespace used within a

document. –  The namespace is identified using a URI (Uniform Resource

Identifier) •  An element or attribute can be associated with a

namespace by placing the namespace prefix before its name (i.e. 'prefix:name') –  Elements (and attributes) belonging to the default namespace

do not require a prefix

Page 30: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

© 2003 B. Jung

<?xml version='1.0'?> <Accident Report xmlns:sjh="http://hospital/sjh" xmlns:dub=http://airport/dub > <sjh:Patient> <sjh:Name> <sjh:First>Mike</sjh:First> <sjh:Last>Murphy</sjh:Last> </sjh:Name> <sjh:DOB>12/12/1950</sjh:DOB> </sjh:Patient> <dub:Drug> <dub:Name>Nurofen</dub:Name> <dub:Code>IE-975-2</dub:Code> </dub:Drug> [...] </Accident Report>

Example: XML Namespaces

St. James’s Hospital <!ELEMENT Patient (Name, DOB)> <!ELEMENT Name (First, Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> <!ELEMENT DOB (#PCDATA)>

Airport Pharmacy

<!ELEMENT Drug ((Name|Substance), Code)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Substance (#PCDATA)> <!ELEMENT Code (#PCDATA)>

Page 31: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Why Namespaces?

•  Important for creating XML documents containing different types of data

•  An XML document can be assembled using elements (and attributes) from different XML vocabularies

•  Must be able to – avoid conflicts between names –  identify the vocabulary an element belongs to

Page 32: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

XML Processing: DOM Processing

XML Doc Character

Stream

Application

Navigation API

DOM

Process into Tree

•  It views an XML tree as a data structure •  It is quite large and complex...

–  Level 1 Core: W3C Recommendation, October 1998 •  primitive navigation and manipulation of XML trees •  other Level 1 parts: HTML

–  Level 2 Core: W3C Recommendation, November 2000 •  adds Namespace support and minor new features •  other Level 2 parts: Events, Views, Style, Traversal and Range

–  Level 3 Core: W3C Working Draft, April 2002 •  adds minor new features •  other Level 3 parts: Schemas, XPath, Load/Save

Page 33: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Example: A Recipe <recipe> <title>Zuppa Inglese</title>

<ingredient name="egg yolks" amount="4" /> <ingredient name="milk" amount="2.5" unit="cup" /> <ingredient name="Savoiardi biscuits" amount="21" /> <ingredient name="sugar" amount="0.75" unit="cup" /> <ingredient name="Alchermes liquor" amount="1" unit="cup" /> <ingredient name="lemon zest" amount="*" /> <ingredient name="flour"

amount="0.5" unit="cup" /> <ingredient name="fresh whipping cream" amount="*" /> -

<preparation> <step>Warm up the milk in a nonstick sauce pan</step> <step>In a large bowl beat the egg yolks with the sugar, add the flour and combine the ingredients until well mixed.</step> <step>Add the milk, a little bit at the time to the egg mixture, mixing well.</step> <step>Put the mixture into the sauce pan and cook it on the stove at a medium low heat. Mix the cream continuously with a wooden spoon. When it starts to thicken remove it from the heat and pour it on a large plate to cool off.</step> <step>Stir the cream now and then so that the top doesn't harden.</step> <step>Dip quickly both sides of the lady fingers in the liquor. Layer them one at the time in a glass bowl large enough to contain 7 biscuits.</step> <step>Spread 1/3 of the cream and repeat the layer with lady fingers. Finish with the cream.</step>

</preparation> <comment>Refrigerate for at least 4 hours better yet overnight. Before

serving decorate the zuppa inglese with whipped cream.</comment> <nutrition calories="612" fat="49" carbohydrates="45" protein="4"

alcohol="2" /> </recipe>

Page 34: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

© 2003 B. Jung

Example: Getting a Recipe import java.io.*;

import org.apache.xerces.parsers.DOMParser;

import org.w3c.dom.*;

public class FirstRecipeDOM {

public static void main(String[] args) {

try {

DOMParser p = new DOMParser();

p.parse(args[0]);

Document doc = p.getDocument();

Node n = doc.getDocumentElement().getFirstChild();

while (n!=null && !n.getNodeName().equals("recipe"))

n = n.getNextSibling();

PrintStream out = System.out;

out.println("<?xml version=\"1.0\"?>");

out.println("<collection>");

if (n!=null)

print(n, out);

out.println("</collection>");

} catch (Exception e) {e.printStackTrace();}}

COPYRIGHT © 2000-2003 ANDERS MØLLER & MICHAEL I. SCHWARTZBACH

Page 35: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

XML Processing: SAX Processing

XML Doc Character Stream

Application

Events API

SAX

•  An XML tree is not viewed as a data structure, but as a stream of events generated by the parser.

•  The kinds of events are: –  the start of the document is encountered –  the end of the document is encountered –  the start tag of an element is encountered –  the end tag of an element is encountered –  character data is encountered –  a processing instruction is encountered

•  Scanning the XML file from start to end, each event invokes a corresponding callback method that the programmer writes

Page 36: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Example: Getting total amount of Flour import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import org.apache.xerces.parsers.SAXParser; public static void main(String[] args) { Flour f = new Flour(); SAXParser p = new SAXParser(); p.setContentHandler(f); try { p.parse(args[0]); } catch (Exception e) {e.printStackTrace();} System.out.println(f.amount); } public class Flour extends DefaultHandler { float amount = 0; public void startElement(String namespaceURI, String localName, String qName, Attributes atts) { if (namespaceURI.equals("http://recipes.org") && localName.equals("ingredient")) { String n = atts.getValue("","name"); if (n.equals("flour")) { String a = atts.getValue("","amount"); // assume 'amount' exists amount = amount + Float.valueOf(a).floatValue(); } } } }

COPYRIGHT © 2000-2003 ANDERS MØLLER & MICHAEL I. SCHWARTZBACH

Page 37: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Summary •  XML = eXtensible Markup Language •  An XML document is a hierarchical data structure

using self-definable tags •  Physical parts of XML document

–  XML Declaration –  Elements –  Attributes –  Document Type Declaration –  Entities –  Processing Instructions –  Comments –  Character Data Sections –  XML Namespaces

•  Two types of APIs popular for XML Processing: DOM & SAX

•  </Lecture>

Page 38: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

© 2003 B. Jung

References •  XML

– Home Page: http://www.w3.org/XML/

– Tutorial: http://www.w3schools.com/xml/default.asp

•  XML Processing – Tutorial:

http://www.brics.dk/~amoeller/XML/programming/index.html

– Home Pages: http://sax.sourceforge.net/ http://www.w3.org/DOM/

© Declan O’Sullivan

Page 39: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

University of Dublin Trinity College

4D2b - Defining XML Vocabularies DTDs and XML Schemas

[email protected]

Page 40: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

What is an XML vocabulary? •  Synonyms

–  ‘Application of XML’ – XML Language

•  Set of elements and attributes for representing domain-specific information

•  “Instance” of a Mark Up Language •  Defined by DTD or XML Schema •  Some are approved by standard organisations

– E.g. ebXML, MathML, XSL etc.

Remember: XML is syntax!

Page 41: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

What is a DTD? •  Document Type Definition, •  Defines structure/model of XML documents

–  Elements and Cardinality –  Attributes –  Aggregation

•  Defines default ATTRIBUTE values •  Defines ENTITIES •  Stored in a plain text file and referenced by an XML document

(external) •  Alternatively a DTD can be placed in the XML document itself

(internal) •  Used to validate an XML document •  “Is there a need for a DTD”?

Page 42: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Why use a DTD?

•  Applications may require all documents to be consistent instances of a particular vocabulary

•  Indicates what structures and names can be used in a document

•  Documents are constructed and named in a conformant manner – Ease constructing (provide structure) – Ease parsing

•  Validate documents in order to find inconsistencies

Page 43: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Valid XML

•  Well-formed plus conforms to DTD •  All elements and attributes are declared within

a DTD (internal or external) •  Elements and attributes match the

declarations in the DTD

Page 44: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Element Type Declaration •  Define grouping of

elements –  "(", “)"

•  Define sequence of elements –  ",": followed-by

(Sequence) –  "|": logical or

(Choice)

<!ELEMENT doc (title, author, editor, chapter, appendix)>

<!ELEMENT title (#PCDATA)> <!ELEMENT author

(name | synonym)> <!ELEMENT image EMPTY> <!ELEMENT paragraph

(#PCDATA | bold | italic)*>

Page 45: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Element Type Declaration •  Define occurrences of

elements – ?: zero-or-one – +: one-or-more – *: zero-or-more

<!ELEMENT doc (title, author+, editor?, chapter+, appendix*)>

<!ELEMENT chapter

(title, (section+ | paragraph+))>

<!ELEMENT list

(item?, item?, item)> <!ENTITY % list "ordered |

unordered | definition"> <!ELEMENT paragraph

(#PCDATA | %list;)*>

Page 46: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Attribute List Declaration •  Define type of attribute

–  ID –  IDREF –  ENTITY –  NMTOKEN –  NOTATION

•  Define default values of attributes –  #REQUIRED –  #IMPLIED –  #FIXED –  A list of values with

default selection

<!ATTLIST person ssn ID #IMPLIED>

<!ATTLIST adult

age CDATA #REQUIRED> <!ATTLIST mml

version ‘1.0’ #FIXED> <!ATTLIST person

sex (m | f) #REQUIRED> <!ATTLIST day

temperature (l | m | h) "l">

Page 47: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Entity Declaration •  Internal entities

– Built-in

•  External entities – References to a file

(text, images etc.)

•  Parameter entities – Used inside DTDs

<!ENTITY author "Norman Walsh, Sun Corp.">

<!ENTITY copyright SYSTEM "copyright.xml">

<!ENTITY % part "(title?, (paragraph | section)*)">

Page 48: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Simple DTD Example <!ENTITY % part "(title?, (paragraph | section)*)"> <!ELEMENT doc (title, author+, chapter+, appendix*)>

<!ATTLIST doc type (book | article) "book“ isbn CDATA #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT chapter %part;>

<!ELEMENT appendix %part;> <!ELEMENT section %part;> <!ELEMENT paragraph (#PCDATA | url | ol)*> <!ATTLIST paragraph type CDATA #IMPLIED> <!ELEMENT ol (item+)> <!ELEMENT item (paragraph+)>

<!ELEMENT url (#PCDATA)>

Page 49: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Example XML and related DTD <database> <person age='34'>

<name> <title> Mr </title> <firstname> John </firstname> <firstname> Paul </firstname> <surname> Murphy </surname> </name> <hobby> Football </hobby> <hobby> Racing </hobby>

</person> <person >

<name> <firstname> Mary </firstname> <surname> Donnelly </surname> </name>

</person> </database>

<!DOCTYPE database [ <!ELEMENT database (person*)> <!ELEMENT person (name,hobby*)> <!ATTLIST person age CDATA

#IMPLIED> <!ELEMENT name (title?, firstname

+, surname)> <!ELEMENT hobby (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT firstname (#PCDATA)>

<!ELEMENT surname (#PCDATA)>

]>

Page 50: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

What are XML Schemas? •  W3C Recommendation, 2 May 2001

– Part 0: Primer – Part 1: Structures – Part 2: Datatypes

•  DTDs use a non-XML syntax and have a number of limitations – no namespace support –  lack of data-types

•  XML Schemas are an alternative to DTDs •  Used to formally specify a "class" of XML

documents ( n "instance document") •  Supports simple/complex data-types

Page 51: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Why use XML Schemas?

•  Uses an XML syntax •  Supports simple and complex data-types such

as user-defined types •  An XML document and its contents can be

validated against a Schema •  Can validate documents containing multiple

namespaces •  Schemas are more powerful than DTDs and

will eventually replace DTDs

Page 52: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Named Types – simple

<!ELEMENT firstname (#PCDATA)>

<xsd:element name="firstname" type="xsd:string"/>

<firstname>Michael</firstname>

DTD

XM

L Sch

ema

XM

L do

c. I

nsta

nce

Page 53: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Named Types – complex

<!ELEMENT name (firstname, lastname)>

<xsd:complexType name="namePerson"> <xsd:sequence> <xsd:element name="firstname" type="xsd:string"/> <xsd:element name="lastname" type="xsd:string/> </xsd:sequence>

</xsd:complexType> <xsd:element name="name" type="namePerson"/>

<name> <firstname>Michael</firstname> <lastname>Porter</lastname>

</name>

DTD

XM

L Sch

ema

XM

L do

c. I

nsta

nce

Page 54: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Primitive Datatypes •  string •  boolean •  decimal •  float •  double •  duration •  dateTime •  time •  date

•  gYearMonth •  gYear •  gMonthDay •  gDay •  gMonth •  hexBinary •  base64Binary •  anyURI •  QName •  NOTATION

http://www.w3.org/TR/xmlschema-2/

Page 55: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Simple Type - Restriction

<simpleType name='celsiusBodyTemp'> <restriction base='decimal'> <totalDigits value='4'/> <fractionDigits value='1'/> <minInclusive value='36.4'/> <maxInclusive value='40.5'/> </restriction> </simpleType> <xsd:element name="temp" type="celsiusBodyTemp"/>

<temp>37.2</temp>

XM

L Sch

ema

XM

L do

c. I

nsta

nce

Page 56: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Simple Type - Enumeration

<xsd:simpleType name="weekday"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="Sunday"/> <xsd:enumeration value="Monday"/> <xsd:enumeration value="Tuesday"/> [...] </xsd:restriction>

</xsd:simpleType> <xsd:element name="delivery" type="weekday"/>

<delivery>Tuesday</delivery>

XM

L Sch

ema

XM

L do

c. I

nsta

nce

Page 57: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Complex Type - Cardinalities

<!ENTITY % fullname "title?, firstname*, lastname"> <!ELEMENT name (%fullname;)> D

TD

<xsd:complexType name="fullname"> <xsd:sequence> <xsd:element name="title" minOccurs="0"/> <xsd:element name="firstname" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="lastname"/> </xsd:sequence>

</xsd:complexType> <xsd:element name="name" type="fullname"/>

<name> <firstname>Michael</firstname> <firstname>Jason</firstname> <lastname>Porter</lastname>

</name>

XM

L Sch

ema

XM

L do

c. I

nsta

nce

Page 58: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Complex Type – Derived Type by extension

<!ENTITY % name "title?, firstname*, lastname"> <!ELEMENT name (%name;, maidenname?)> D

TD

<xsd:complexType name="fullnameExt"> <xsd:complexContent> <xsd:extension base="fullname"> <xsd:sequence> <xsd:element name="maidenname" minOccurs="0"/> </xsd:sequence> </xsd:extension> </xsd:complexContent>

</xsd:complexType> <xsd:element name="name" type="fullnameExt"/>

<name> <firstname>Jane</firstname> <lastname>Porter</lastname> <maidenname>Hughes</maidenname>

</name>

XM

L Sch

ema

XM

L do

c. I

nsta

nce

Page 59: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Complex Type – Derived Type by Restriction

<xsd:complexType name="simpleName"> <xsd:complexContent> <xsd:restriction base="fullname"> <xsd:sequence> <xsd:element name="title" maxOccurs="0"/> <xsd:element name="firstname" minOccurs="1"/> <xsd:element name="lastname"/> </xsd:sequence> </xsd:restriction> </xsd:complexContent>

</xsd:complexType>

<xsd:element name="name" type="simpleName"/> <name> <firstname>Jane</firstname> <lastname>Porter</lastname>

</name>

XM

L Sch

ema

XM

L do

c. I

nsta

nce

Page 60: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Structure - Sequence

<!ELEMENT name (title?, firstname*, lastname)>

<xsd:complexType name="fullname"> <xsd:sequence> <xsd:element name="title" minOccurs="0"/> <xsd:element name="firstname" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="lastname"/> </xsd:sequence>

</xsd:complexType> <xsd:element name="name" type="fullname"/>

<name> <firstname>Michael</firstname> <firstname>Jason</firstname> <lastname>Porter</lastname>

</name>

DTD

XM

L Sch

ema

XM

L do

c. I

nsta

nce

Page 61: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Structure - Choice

<!ELEMENT pay (product, number, (cash | cheque))>

<xsd:complexType name="payment"> <xsd:sequence> <xsd:element ref="product"/> <xsd:element ref="number"/> <xsd:choice> <xsd:element ref="cash"/> <xsd:element ref="cheque"/> </xsd:choice> </xsd:sequence>

</xsd:complexType> <xsd:element name="pay" type="payment"/>

<pay> <product>Ericsson Telefon MD110</product> <number>1544-198-J</number> <cash>IR£150</cash>

</pay>

DTD

XM

L Sch

ema

XM

L do

c. I

nst.

Page 62: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Attributes

<!ELEMENT greeting (#PCDATA)> <!ATTLIST greeting language CDATA "English">

<xsd:element name="greeting"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="language" type="xsd:string"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType>

</xsd:element>

<greeting language="German">Hello!</greeting>

DTD

XM

L Sch

ema

XM

L do

c. I

nsta

nce

Page 63: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Attribute Groups

<!ELEMENT img EMPTY> <!ATTLIST img src CDATA #REQUIRED

width CDATA #IMPLIED height CDATA #IMPLIED>

<xsd:attributeGroup name="imgAttributes"> <xsd:attribute name="src" type="xsd:string" use="required"/> <xsd:attribute name="width" type="xsd:integer"/> <xsd:attribute name="height" type="xsd:integer"/>

</xsd:attributeGroup> <xsd:element name="img">

<xsd:complexType> <xsd:attributeGroup ref="imgAttributes"/> <xsd:complexType>

</xsd:element>

<img src="XMLmanager.gif" width="60"/>

DTD

XM

L Sch

ema

XM

L In

st.

Page 64: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Mixed Content

<!ELEMENT p (#PCDATA | b | i)*> <!ELEMENT b (#PCDATA)>

<xsd:complexType name="bolditalicText" mixed="true"> <xsd:choice minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="b" /> <xsd:element ref="i" /> </xsd:choice>

</xsd:complexType> <xsd:element name="p" type="bolditalicText"/>

<p>This is <b>bold</b> and <i>italic</i> text</p>

DTD

XM

L Sch

ema

XM

L do

c. I

nsta

nce

Page 65: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Empty Element

<!ELEMENT img EMPTY> <!ATTLIST src CDATA #REQUIRED>

<xsd:element name="img"> <xsd:complexType> <xsd:attribute name="src" type="xsd:string"/> </xsd:complexType>

</xsd:element>

<img src="XMLmanager.gif"/>

DTD

XM

L Sch

ema

XM

L do

c. I

nsta

nce

Page 66: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

XML Schema Example <?xml version="1.0" encoding="utf-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:element name="book"> <xsd:complexType> <xsd:sequence> <xsd:element name="title" type="xsd:string"/> <xsd:element name="author" type="xsd:string"/> <xsd:element name="character” type="xsd:string" minOccurs="0" maxOccurs="unbounded"> </xsd:element> </xsd:sequence> <xsd:attribute name="isbn" type="xsd:string"/> </xsd:complexType> </xsd:element> </xsd:schema>

Page 67: 4D2b - Fundamentals of XML 4D2b... · 2014. 1. 24. · document • An XML parser is not required to pass comments to higher-level applications

Summary •  XML Vocabularies are defined using

– DTD – XSD

•  DTDs/XSDs used to validate XML documents •  XSD – more powerful than DTDs

– Supports simple and complex data-types such as user-defined types

– Can validate documents containing multiple namespaces