What is Markup
• <Lecture> • Sequence of characters within a text or word
processing file to define – Print properties – Display properties – Document's logical structure
• Markup indicators are often called "tags" – Examples
• RTF • EDIFACT • XML
<>
</> \
{}
'
" :
+
Mark Up: RTF
\li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel2\adjustright\rin0\lin0\itap0 \b\f1\fs26\lang2057\langfe1033\cgrid\langnp2057\langfenp1033 {\lang6153\langfe1033\langnp6153 Entity Relationship Diagram \par }\pard\plain \s1\ql \li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\adjustright\rin0\lin0\itap0 \cbpat17 \b\f1\fs24\lang2057\langfe1033\kerning32\cgrid\langnp2057\langfenp1033 {\lang6153\langfe1033\langnp6153 Entity Type \par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang2057\langfe1033\cgrid\langnp2057\langfenp1033 {\b\fs20\ul\lang6153\langfe1033\langnp6153 Def.:}{\b\fs20\lang6153\langfe1033\langnp6153 }{ \fs20\lang6153\langfe1033\langnp6153 An object or concept that is identified by the enterprise as having an independent existence. \par }\pard\plain \s1\ql \li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\adjustright\rin0\lin0\itap0 \cbpat17 \b\f1\fs24\lang2057\langfe1033\kerning32\cgrid\langnp2057\langfenp1033 {\lang6153\langfe1033\langnp6153 Entity \par }\pard\plain \ql
Mark Up: EDIFACT
'''ED2'''OPENET:1111111:OVT':003705655815:OVT'ABC1234567'0'TYP:ORDERS'NRQ:1''' UNA:+.? 'UNB+UNOC:2+003705655815:30+1111111:30+980729:2233+4++ORDERS911+++KKK KATE+1'UNH+ 1+ORDERS:001:911:UN:FI0030'BGM+640+1234567'DTM+4:19981201:102'DTM+2:19990101:102'DTM+2:9901:616'RFF+BC:123'RFF+VN:123456'NAD+BY+003705655815:100' NAD+SE+11111111::92'NAD+PL+53432::92++KAUPPA:KAUPUNKI+KATU 9+KAUPUNKI++00007'NAD+CN+-::ZZ++TERMINAALI+OVI 42+TOINEN KAUPUNKI++00069'UNS+D'LIN+1++23442423234 :EN'PIA+5+3244:MF'PIA+5+2341234324:ZBU'PIA+5+234243:ZCG'IMD+F+8+-::91:KUKKAPUR KKI:SAVI'QTY+21:8:KPL'FTX+AAA+++T.HARMAA:V[RI'FTX+AAA+++10:KOKO'PRI+NTP:7.23:+ RP:7.32:PE'TAX+7+VAT+++:::22.00'LIN+2++543434554345:EN'PIA+5+535:MF'PIA+5+45: PCE‘UNT+38+2'UNZ+2+4' '''EOF'''9'
Mark Up: XML
<fragment> <section> <title>Introduction</title> <para>Since the emergence of <acronym refid="xml">XML</acronym> in early 1998 and it's subsequent adoption across diverse application domains, one of the key benefits it enabled was the separation of content and presentation <bibref refloc="Bos97"/>. <acronym refid="xml">XML</acronym> borrowed this model (along with other important concepts) from the <acronym.grp><acronym refid="sgml">SGML</acronym><expansion id="sgml">Standard Generalised Markup Language</expansion></acronym.grp>. An <acronym refid="sgml">SGML</acronym> document consists of logically structured content and uses a separate file (style sheet) to specify how the content should be formatted for [...] <figure id="img1"> <title>ePublishing Components</title> <graphic href="02-04-03-fig01.jpg" width="321" height="214"/> </figure> </section> </fragment>
What is SGML? • Standard Generalised Mark-
Up Language • ISO standard since 1986 • Meta-language for defining
document mark-up vocabularies
• Uses logical mark-up (structure, content) instead physical (how document looks on printed page)
• Platform-, system-, vendor- and version-independent documents
• Very powerful, but contains a number of complex features
<!DOCTYPE anthology [ <!ELEMENT anthology - - (poem+) > <!ELEMENT poem - - (title?, stanza+)> <!ELEMENT title - O (#PCDATA) > <!ELEMENT stanza - O (line+) > <!ELEMENT line O O (#PCDATA) > ]]>
<anthology> <poem> <title> The SICK ROSE </title> <stanza> <line>O Rose thou art sick.</line> <line>The invisible worm,</line> [...] </stanza> <stanza> <line>Has found out thy bed</line> <line>Of crimson joy:</line> [...] </stanza> </poem> </anthology>
What is HTML? • HTML, the de facto standard
for publishing Web content, is an SGML vocabulary
• Supporting full SGML on the Web was too difficult so HTML made some simplifications – not extensible – limited structure – not content oriented – cannot be validated
• HTML is a simple language to understand and use
• Most of the content available on the Web has been created with HTML
<html> <head> <title>The SICK ROSE</title> </head> <body> <h1>The SICK ROSE</h1> <p> O Rose thou art sick.<br /> The invisible worm,<br /> [...] </p> <p> Has found out thy bed<br /> Of crimson joy:<br /> [...] </p> </body>
</html>
What is XML? • eXtensible Markup Language • XML is a simplified subset of SGML • Can also be used to define document markup
vocabularies (e.g. XHTML) – These can have a strictly defined structure (DTD)
• Retains the powerful features of SGML (extensibility, structure, validation)
• Ignores the complex features of SGML and is therefore easier to use and implement
• XML documents look similar to HTML documents • Separates structure and presentation (like SGML)
Design of XML
• The design goals for XML as set out in the 1.0 specification are as follows: 1. XML shall be straightforwardly usable over the
Internet. 2. XML shall support a wide variety of applications. 3. XML shall be compatible with SGML. 4. It shall be easy to write programs which process
XML documents. 5. The number of optional features in XML is to be
kept to the absolute minimum, ideally zero.
Design of XML 6. XML documents should be human-legible and
reasonably clear. 7. The XML design should be prepared quickly. 8. The design of XML shall be formal and concise. 9. XML documents shall be easy to create. 10. Terseness in XML markup is of minimal
importance.
XML Example <?xml version='1.0' encoding='ISO-8859-1' standalone='yes' ?> <doc type="book" isbn="1-56592-796-9" xml:lang="en"> <title>A Guide to XML</title> <author>Norman Walsh</author> <chapter> <title>What Do XML Documents Look Like?</title> <paragraph>If you are [...]</paragraph> <ol> <item> <paragraph>The document begins [...]</paragraph> </item> <item> <paragraph>Empty elements have [...]</paragraph> <paragraph>In a very [...]</paragraph> </item> </ol> <section>[...]</section> [...] </chapter> <chapter>[...]</chapter> </doc>
Vocab
ularies
SGML
HTML
XHTML
Meta Lan
gu
ages
XSL
SVG
SMIL
HL7 v3 XTM
CEN TC251
ASTM 31.25 SynExML
XML
Presentation Vocabularies Electronic Patient Record Vocabularies
Meta Language vs. Vocabulary
Why is the emergence of XML an important development? • XML is a tool for defining languages
– XML languages are easy to read – XML is self describing
• Parse tree embedded in document • Grammar for language referenced via DTD/Schema
• XML languages are easy for computers to process, exchange and display – XML tools are ubiquitous, free and conform to
established standards – Natural affinity with Object serialization – Data source neutral
XML technologies
CSS, Cascading Style Sheets XSL, Extensible Stylesheet Language XPath, XQuery
XLink, XBase XPointer
Topics Maps, Ontology Web Language RDF, Resource Description Framework
XML Schema, RelaxNG, RDF Schema, Document Type Definition (DTD)
XML Namespaces XML 1.0
Structure
Semantics
Linking
Presentation
Syntax
XML 1.0
• The XML 1.0 specification describes the syntax for XML documents (elements and attributes) and DTDs
• An XML document is a hierarchical data structure using self-definable tags – e.g. <doc><author>[..]</author></doc>
• There are many other technologies related to XML
• XML is A simple common layer for tree structures in a character stream.
Design goals of XML 1.0 specification 1. XML shall be straightforwardly usable over the
Internet. 2. XML shall support a wide variety of applications. 3. XML shall be compatible with SGML. 4. It shall be easy to write programs which process XML
documents. 5. The number of optional features in XML is to be kept
to the absolute minimum, ideally zero. 6. XML documents should be human-legible and
reasonably clear. 7. The XML design should be prepared quickly. 8. The design of XML shall be formal and concise. 9. XML documents shall be easy to create. 10. Terseness in XML markup is of minimal importance.
Physical Parts of XML documents
Physical parts of XML documents
• XML Declaration • Elements • Attributes • Document Type Declaration • Entities • Processing Instructions • Comments • Character Data Sections
• XML Namespaces
XML Declaration • Placed at the start of an
XML document • Informs XML software of
– the version of XML the document conforms to
– the character encoding scheme used in the document
– whether or not a set of external declarations affect the interpretation of this document
<?xml version="1.0" ?> <?xml
version="1.0“ encoding="UTF-8" ?>
<?xml
version="1.0“ encoding="UTF-8" standalone="yes" ?>
Elements • Define logical structure and
sections of XML documents • Four different content types:
– Data content – Element content – Mixed content – Empty.
• Each element must be completely enclosed by another element, except for the root
• Note – Any XML name must start
with a letter, underscore but after that can include also digits, fullstops, hyphens. Don’t start with colon due to namespaces Don’t include spaces
<?xml version="1.0" ?> <doc>
<title>Java Gently</title> <author>Judy Bishop</author> <publisher name=‘HH’ />
<chapter> <thetext> this is <bold>
bold </bold> text </thetext> <paragraph/> </chapter> </doc>
Attributes • Provides additional
information about an element
• Attributes are contained within the start-tag
• Consists of a name and associated value separated by an equals sign
• The attribute value must always be enclosed by quotes
• The order of attributes is insignificant
<?xml version="1.0" ?> <doc type="book" isbn="0-201-71050-1"> <title>Java Gently</title> <author>Judy Bishop</author> <chapter> <paragraph type="abstract"> In this book ... </paragraph> </chapter> </doc>
ELEMENT vs. ATTRIBUTE
ELEMENT • Constituent data, • Used for content, • White space can be
ignored or preserved • Nesting allowed (child
elements), • Convenient for large
values, or binary entities.
ATTRIBUTE • Inherent data, • Used for meta-data, • No further nesting
possible (atomic data), • Default values, • Minimal datatypes,
• Lexically little difference, • application specific, • no hard/fast rules available.
Entities • Storage units for
repeated text – Defined in a DTD
• Character entities are used to insert characters that cannot be typed directly
• XML contains a number of 'built-in' entities – " – ' – < – > – &
<math> 5 < 6 and 6 > 5 </math>
<copyright> ©right-notice; </copyright> <bullet> XML contains a number
of 'built-in' entities <list> <item>&quot;</item> <item>&apos;</item>
<item>&lt;</item> <item>&gt;</item> <item>&amp;</item>
</list> </bullet>
Character Data Sections • Data which is to be
parsed is called PCDATA • An XML parser will not
treat the contents of a CDATA section as markup – Used to simplify mark-up
by escaping a selection of text
• Entity references are not resolved
• Useful for including source code in XML
<![CDATA[ You don't need to escape special characters in CDATA sections, such as <, >, &, , ' and ". ]]> <![CDATA[<<< STOP now >>>]]>
<![CDATA[<?xml version='1.0'?> <person> <name>Mike</name> <age>24</age> </person>]]>
Processing Instructions • Pass additional
information to application (e.g. parser)
• Application-specific instructions
• Consists of a PI Target and PI Value
• Processed by applications that recognise the PI Target
<?xml-stylesheet type='text/css' href='style.css'?>
<?xml-stylesheet type='text/
xsl' href='style.xsl'?> <?myapp filename='test.txt'?>
Comments • Used to comment XML
documents • Not considered to be
part of an XML document
• An XML parser is not required to pass comments to higher-level applications
<!–- one-line comment --> <!-- This is a multi-line comment -->
Well formed XML • XML Declaration required • At least one element
– Exactly one root element
• Empty elements are written in one of two ways: – Closing tag (e.g. "<br></br>") – Special start tag (e.g. "<br />")
• For non-empty elements, closing tags are required • Start tag must match closing tag (name & case) • Correct nesting of elements • Attribute values must always be quoted • Attribute minimisation not allowed
Document Type Declaration • Internal/embedded DTD
• External DTD
<?xml version='1.0' standalone='yes'>
<!DOCTYPE person [ <!ELEMENT person (name,
adult, nationality)> … ]>
<?xml version='1.0'> <!DOCTYPE person SYSTEM
'person.dtd'>
What are XML Namespaces? • W3C recommendation (January 1999) • Each XML vocabulary is considered to own a
namespace in which all elements (and attributes) are unique
• A single document can use elements and attributes from multiple namespaces – A prefix is declared for each namespace used within a
document. – The namespace is identified using a URI (Uniform Resource
Identifier) • An element or attribute can be associated with a
namespace by placing the namespace prefix before its name (i.e. 'prefix:name') – Elements (and attributes) belonging to the default namespace
do not require a prefix
© 2003 B. Jung
<?xml version='1.0'?> <Accident Report xmlns:sjh="http://hospital/sjh" xmlns:dub=http://airport/dub > <sjh:Patient> <sjh:Name> <sjh:First>Mike</sjh:First> <sjh:Last>Murphy</sjh:Last> </sjh:Name> <sjh:DOB>12/12/1950</sjh:DOB> </sjh:Patient> <dub:Drug> <dub:Name>Nurofen</dub:Name> <dub:Code>IE-975-2</dub:Code> </dub:Drug> [...] </Accident Report>
Example: XML Namespaces
St. James’s Hospital <!ELEMENT Patient (Name, DOB)> <!ELEMENT Name (First, Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> <!ELEMENT DOB (#PCDATA)>
Airport Pharmacy
<!ELEMENT Drug ((Name|Substance), Code)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Substance (#PCDATA)> <!ELEMENT Code (#PCDATA)>
Why Namespaces?
• Important for creating XML documents containing different types of data
• An XML document can be assembled using elements (and attributes) from different XML vocabularies
• Must be able to – avoid conflicts between names – identify the vocabulary an element belongs to
XML Processing: DOM Processing
XML Doc Character
Stream
Application
Navigation API
DOM
Process into Tree
• It views an XML tree as a data structure • It is quite large and complex...
– Level 1 Core: W3C Recommendation, October 1998 • primitive navigation and manipulation of XML trees • other Level 1 parts: HTML
– Level 2 Core: W3C Recommendation, November 2000 • adds Namespace support and minor new features • other Level 2 parts: Events, Views, Style, Traversal and Range
– Level 3 Core: W3C Working Draft, April 2002 • adds minor new features • other Level 3 parts: Schemas, XPath, Load/Save
Example: A Recipe <recipe> <title>Zuppa Inglese</title>
<ingredient name="egg yolks" amount="4" /> <ingredient name="milk" amount="2.5" unit="cup" /> <ingredient name="Savoiardi biscuits" amount="21" /> <ingredient name="sugar" amount="0.75" unit="cup" /> <ingredient name="Alchermes liquor" amount="1" unit="cup" /> <ingredient name="lemon zest" amount="*" /> <ingredient name="flour"
amount="0.5" unit="cup" /> <ingredient name="fresh whipping cream" amount="*" /> -
<preparation> <step>Warm up the milk in a nonstick sauce pan</step> <step>In a large bowl beat the egg yolks with the sugar, add the flour and combine the ingredients until well mixed.</step> <step>Add the milk, a little bit at the time to the egg mixture, mixing well.</step> <step>Put the mixture into the sauce pan and cook it on the stove at a medium low heat. Mix the cream continuously with a wooden spoon. When it starts to thicken remove it from the heat and pour it on a large plate to cool off.</step> <step>Stir the cream now and then so that the top doesn't harden.</step> <step>Dip quickly both sides of the lady fingers in the liquor. Layer them one at the time in a glass bowl large enough to contain 7 biscuits.</step> <step>Spread 1/3 of the cream and repeat the layer with lady fingers. Finish with the cream.</step>
</preparation> <comment>Refrigerate for at least 4 hours better yet overnight. Before
serving decorate the zuppa inglese with whipped cream.</comment> <nutrition calories="612" fat="49" carbohydrates="45" protein="4"
alcohol="2" /> </recipe>
© 2003 B. Jung
Example: Getting a Recipe import java.io.*;
import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.*;
public class FirstRecipeDOM {
public static void main(String[] args) {
try {
DOMParser p = new DOMParser();
p.parse(args[0]);
Document doc = p.getDocument();
Node n = doc.getDocumentElement().getFirstChild();
while (n!=null && !n.getNodeName().equals("recipe"))
n = n.getNextSibling();
PrintStream out = System.out;
out.println("<?xml version=\"1.0\"?>");
out.println("<collection>");
if (n!=null)
print(n, out);
out.println("</collection>");
} catch (Exception e) {e.printStackTrace();}}
COPYRIGHT © 2000-2003 ANDERS MØLLER & MICHAEL I. SCHWARTZBACH
XML Processing: SAX Processing
XML Doc Character Stream
Application
Events API
SAX
• An XML tree is not viewed as a data structure, but as a stream of events generated by the parser.
• The kinds of events are: – the start of the document is encountered – the end of the document is encountered – the start tag of an element is encountered – the end tag of an element is encountered – character data is encountered – a processing instruction is encountered
• Scanning the XML file from start to end, each event invokes a corresponding callback method that the programmer writes
Example: Getting total amount of Flour import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import org.apache.xerces.parsers.SAXParser; public static void main(String[] args) { Flour f = new Flour(); SAXParser p = new SAXParser(); p.setContentHandler(f); try { p.parse(args[0]); } catch (Exception e) {e.printStackTrace();} System.out.println(f.amount); } public class Flour extends DefaultHandler { float amount = 0; public void startElement(String namespaceURI, String localName, String qName, Attributes atts) { if (namespaceURI.equals("http://recipes.org") && localName.equals("ingredient")) { String n = atts.getValue("","name"); if (n.equals("flour")) { String a = atts.getValue("","amount"); // assume 'amount' exists amount = amount + Float.valueOf(a).floatValue(); } } } }
COPYRIGHT © 2000-2003 ANDERS MØLLER & MICHAEL I. SCHWARTZBACH
Summary • XML = eXtensible Markup Language • An XML document is a hierarchical data structure
using self-definable tags • Physical parts of XML document
– XML Declaration – Elements – Attributes – Document Type Declaration – Entities – Processing Instructions – Comments – Character Data Sections – XML Namespaces
• Two types of APIs popular for XML Processing: DOM & SAX
• </Lecture>
© 2003 B. Jung
References • XML
– Home Page: http://www.w3.org/XML/
– Tutorial: http://www.w3schools.com/xml/default.asp
• XML Processing – Tutorial:
http://www.brics.dk/~amoeller/XML/programming/index.html
– Home Pages: http://sax.sourceforge.net/ http://www.w3.org/DOM/
© Declan O’Sullivan
University of Dublin Trinity College
4D2b - Defining XML Vocabularies DTDs and XML Schemas
What is an XML vocabulary? • Synonyms
– ‘Application of XML’ – XML Language
• Set of elements and attributes for representing domain-specific information
• “Instance” of a Mark Up Language • Defined by DTD or XML Schema • Some are approved by standard organisations
– E.g. ebXML, MathML, XSL etc.
Remember: XML is syntax!
What is a DTD? • Document Type Definition, • Defines structure/model of XML documents
– Elements and Cardinality – Attributes – Aggregation
• Defines default ATTRIBUTE values • Defines ENTITIES • Stored in a plain text file and referenced by an XML document
(external) • Alternatively a DTD can be placed in the XML document itself
(internal) • Used to validate an XML document • “Is there a need for a DTD”?
Why use a DTD?
• Applications may require all documents to be consistent instances of a particular vocabulary
• Indicates what structures and names can be used in a document
• Documents are constructed and named in a conformant manner – Ease constructing (provide structure) – Ease parsing
• Validate documents in order to find inconsistencies
Valid XML
• Well-formed plus conforms to DTD • All elements and attributes are declared within
a DTD (internal or external) • Elements and attributes match the
declarations in the DTD
Element Type Declaration • Define grouping of
elements – "(", “)"
• Define sequence of elements – ",": followed-by
(Sequence) – "|": logical or
(Choice)
<!ELEMENT doc (title, author, editor, chapter, appendix)>
<!ELEMENT title (#PCDATA)> <!ELEMENT author
(name | synonym)> <!ELEMENT image EMPTY> <!ELEMENT paragraph
(#PCDATA | bold | italic)*>
Element Type Declaration • Define occurrences of
elements – ?: zero-or-one – +: one-or-more – *: zero-or-more
<!ELEMENT doc (title, author+, editor?, chapter+, appendix*)>
<!ELEMENT chapter
(title, (section+ | paragraph+))>
<!ELEMENT list
(item?, item?, item)> <!ENTITY % list "ordered |
unordered | definition"> <!ELEMENT paragraph
(#PCDATA | %list;)*>
Attribute List Declaration • Define type of attribute
– ID – IDREF – ENTITY – NMTOKEN – NOTATION
• Define default values of attributes – #REQUIRED – #IMPLIED – #FIXED – A list of values with
default selection
<!ATTLIST person ssn ID #IMPLIED>
<!ATTLIST adult
age CDATA #REQUIRED> <!ATTLIST mml
version ‘1.0’ #FIXED> <!ATTLIST person
sex (m | f) #REQUIRED> <!ATTLIST day
temperature (l | m | h) "l">
Entity Declaration • Internal entities
– Built-in
• External entities – References to a file
(text, images etc.)
• Parameter entities – Used inside DTDs
<!ENTITY author "Norman Walsh, Sun Corp.">
<!ENTITY copyright SYSTEM "copyright.xml">
<!ENTITY % part "(title?, (paragraph | section)*)">
Simple DTD Example <!ENTITY % part "(title?, (paragraph | section)*)"> <!ELEMENT doc (title, author+, chapter+, appendix*)>
<!ATTLIST doc type (book | article) "book“ isbn CDATA #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT chapter %part;>
<!ELEMENT appendix %part;> <!ELEMENT section %part;> <!ELEMENT paragraph (#PCDATA | url | ol)*> <!ATTLIST paragraph type CDATA #IMPLIED> <!ELEMENT ol (item+)> <!ELEMENT item (paragraph+)>
<!ELEMENT url (#PCDATA)>
Example XML and related DTD <database> <person age='34'>
<name> <title> Mr </title> <firstname> John </firstname> <firstname> Paul </firstname> <surname> Murphy </surname> </name> <hobby> Football </hobby> <hobby> Racing </hobby>
</person> <person >
<name> <firstname> Mary </firstname> <surname> Donnelly </surname> </name>
</person> </database>
<!DOCTYPE database [ <!ELEMENT database (person*)> <!ELEMENT person (name,hobby*)> <!ATTLIST person age CDATA
#IMPLIED> <!ELEMENT name (title?, firstname
+, surname)> <!ELEMENT hobby (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT firstname (#PCDATA)>
<!ELEMENT surname (#PCDATA)>
]>
What are XML Schemas? • W3C Recommendation, 2 May 2001
– Part 0: Primer – Part 1: Structures – Part 2: Datatypes
• DTDs use a non-XML syntax and have a number of limitations – no namespace support – lack of data-types
• XML Schemas are an alternative to DTDs • Used to formally specify a "class" of XML
documents ( n "instance document") • Supports simple/complex data-types
Why use XML Schemas?
• Uses an XML syntax • Supports simple and complex data-types such
as user-defined types • An XML document and its contents can be
validated against a Schema • Can validate documents containing multiple
namespaces • Schemas are more powerful than DTDs and
will eventually replace DTDs
Named Types – simple
<!ELEMENT firstname (#PCDATA)>
<xsd:element name="firstname" type="xsd:string"/>
<firstname>Michael</firstname>
DTD
XM
L Sch
ema
XM
L do
c. I
nsta
nce
Named Types – complex
<!ELEMENT name (firstname, lastname)>
<xsd:complexType name="namePerson"> <xsd:sequence> <xsd:element name="firstname" type="xsd:string"/> <xsd:element name="lastname" type="xsd:string/> </xsd:sequence>
</xsd:complexType> <xsd:element name="name" type="namePerson"/>
<name> <firstname>Michael</firstname> <lastname>Porter</lastname>
</name>
DTD
XM
L Sch
ema
XM
L do
c. I
nsta
nce
Primitive Datatypes • string • boolean • decimal • float • double • duration • dateTime • time • date
• gYearMonth • gYear • gMonthDay • gDay • gMonth • hexBinary • base64Binary • anyURI • QName • NOTATION
http://www.w3.org/TR/xmlschema-2/
Simple Type - Restriction
<simpleType name='celsiusBodyTemp'> <restriction base='decimal'> <totalDigits value='4'/> <fractionDigits value='1'/> <minInclusive value='36.4'/> <maxInclusive value='40.5'/> </restriction> </simpleType> <xsd:element name="temp" type="celsiusBodyTemp"/>
<temp>37.2</temp>
XM
L Sch
ema
XM
L do
c. I
nsta
nce
Simple Type - Enumeration
<xsd:simpleType name="weekday"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="Sunday"/> <xsd:enumeration value="Monday"/> <xsd:enumeration value="Tuesday"/> [...] </xsd:restriction>
</xsd:simpleType> <xsd:element name="delivery" type="weekday"/>
<delivery>Tuesday</delivery>
XM
L Sch
ema
XM
L do
c. I
nsta
nce
Complex Type - Cardinalities
<!ENTITY % fullname "title?, firstname*, lastname"> <!ELEMENT name (%fullname;)> D
TD
<xsd:complexType name="fullname"> <xsd:sequence> <xsd:element name="title" minOccurs="0"/> <xsd:element name="firstname" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="lastname"/> </xsd:sequence>
</xsd:complexType> <xsd:element name="name" type="fullname"/>
<name> <firstname>Michael</firstname> <firstname>Jason</firstname> <lastname>Porter</lastname>
</name>
XM
L Sch
ema
XM
L do
c. I
nsta
nce
Complex Type – Derived Type by extension
<!ENTITY % name "title?, firstname*, lastname"> <!ELEMENT name (%name;, maidenname?)> D
TD
<xsd:complexType name="fullnameExt"> <xsd:complexContent> <xsd:extension base="fullname"> <xsd:sequence> <xsd:element name="maidenname" minOccurs="0"/> </xsd:sequence> </xsd:extension> </xsd:complexContent>
</xsd:complexType> <xsd:element name="name" type="fullnameExt"/>
<name> <firstname>Jane</firstname> <lastname>Porter</lastname> <maidenname>Hughes</maidenname>
</name>
XM
L Sch
ema
XM
L do
c. I
nsta
nce
Complex Type – Derived Type by Restriction
<xsd:complexType name="simpleName"> <xsd:complexContent> <xsd:restriction base="fullname"> <xsd:sequence> <xsd:element name="title" maxOccurs="0"/> <xsd:element name="firstname" minOccurs="1"/> <xsd:element name="lastname"/> </xsd:sequence> </xsd:restriction> </xsd:complexContent>
</xsd:complexType>
<xsd:element name="name" type="simpleName"/> <name> <firstname>Jane</firstname> <lastname>Porter</lastname>
</name>
XM
L Sch
ema
XM
L do
c. I
nsta
nce
Structure - Sequence
<!ELEMENT name (title?, firstname*, lastname)>
<xsd:complexType name="fullname"> <xsd:sequence> <xsd:element name="title" minOccurs="0"/> <xsd:element name="firstname" minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="lastname"/> </xsd:sequence>
</xsd:complexType> <xsd:element name="name" type="fullname"/>
<name> <firstname>Michael</firstname> <firstname>Jason</firstname> <lastname>Porter</lastname>
</name>
DTD
XM
L Sch
ema
XM
L do
c. I
nsta
nce
Structure - Choice
<!ELEMENT pay (product, number, (cash | cheque))>
<xsd:complexType name="payment"> <xsd:sequence> <xsd:element ref="product"/> <xsd:element ref="number"/> <xsd:choice> <xsd:element ref="cash"/> <xsd:element ref="cheque"/> </xsd:choice> </xsd:sequence>
</xsd:complexType> <xsd:element name="pay" type="payment"/>
<pay> <product>Ericsson Telefon MD110</product> <number>1544-198-J</number> <cash>IR£150</cash>
</pay>
DTD
XM
L Sch
ema
XM
L do
c. I
nst.
Attributes
<!ELEMENT greeting (#PCDATA)> <!ATTLIST greeting language CDATA "English">
<xsd:element name="greeting"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="language" type="xsd:string"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType>
</xsd:element>
<greeting language="German">Hello!</greeting>
DTD
XM
L Sch
ema
XM
L do
c. I
nsta
nce
Attribute Groups
<!ELEMENT img EMPTY> <!ATTLIST img src CDATA #REQUIRED
width CDATA #IMPLIED height CDATA #IMPLIED>
<xsd:attributeGroup name="imgAttributes"> <xsd:attribute name="src" type="xsd:string" use="required"/> <xsd:attribute name="width" type="xsd:integer"/> <xsd:attribute name="height" type="xsd:integer"/>
</xsd:attributeGroup> <xsd:element name="img">
<xsd:complexType> <xsd:attributeGroup ref="imgAttributes"/> <xsd:complexType>
</xsd:element>
<img src="XMLmanager.gif" width="60"/>
DTD
XM
L Sch
ema
XM
L In
st.
Mixed Content
<!ELEMENT p (#PCDATA | b | i)*> <!ELEMENT b (#PCDATA)>
<xsd:complexType name="bolditalicText" mixed="true"> <xsd:choice minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="b" /> <xsd:element ref="i" /> </xsd:choice>
</xsd:complexType> <xsd:element name="p" type="bolditalicText"/>
<p>This is <b>bold</b> and <i>italic</i> text</p>
DTD
XM
L Sch
ema
XM
L do
c. I
nsta
nce
Empty Element
<!ELEMENT img EMPTY> <!ATTLIST src CDATA #REQUIRED>
<xsd:element name="img"> <xsd:complexType> <xsd:attribute name="src" type="xsd:string"/> </xsd:complexType>
</xsd:element>
<img src="XMLmanager.gif"/>
DTD
XM
L Sch
ema
XM
L do
c. I
nsta
nce
XML Schema Example <?xml version="1.0" encoding="utf-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:element name="book"> <xsd:complexType> <xsd:sequence> <xsd:element name="title" type="xsd:string"/> <xsd:element name="author" type="xsd:string"/> <xsd:element name="character” type="xsd:string" minOccurs="0" maxOccurs="unbounded"> </xsd:element> </xsd:sequence> <xsd:attribute name="isbn" type="xsd:string"/> </xsd:complexType> </xsd:element> </xsd:schema>
Summary • XML Vocabularies are defined using
– DTD – XSD
• DTDs/XSDs used to validate XML documents • XSD – more powerful than DTDs
– Supports simple and complex data-types such as user-defined types
– Can validate documents containing multiple namespaces