introduction to dtd
TRANSCRIPT
Introduction to DTD
Kristian Torp
Department of Computer ScienceAalborg University
people.cs.aau.dk/˜[email protected]
November 3, 2015
daisy.aau.dk
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 1 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 2 / 37
Learning Outcomes
Learning OutcomesBe able to read and understand a DTD
Be able to construct a DTD for a set of existing XML documents
Be able to validate an XML document against a DTD
Know the limitations of a DTD
Database FocusAll XML technologies are presented from a database perspective!
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 3 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 4 / 37
Example: Course Catalog XML Document
User RequirementsMake a DTD for the course catalog
Use the DTD to validate our course catalog XML document
Example (Current Courses)<?xml vers ion= ” 1.0 ” ?><coursecata log>
<course c id= ’P4 ’><name>OOP< / name><semester>3< / semester><desc>Object−or iented programming< / desc>
< / course><course c id= ’P2 ’>
<name>DB< / name><semester>7< / semester><desc>Databases i n c l u d i n g SQL< / desc>
< / course>< / coursecata log>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 5 / 37
Example: Course Catalog DTD
Example (DTD for Course Catalog)<?xml vers ion= ” 1.0 ” encoding= ”UTF−8” ?>< !ELEMENT coursecata log ( course )∗>< !ELEMENT course (name, semester , desc ) >< !ELEMENT name (#PCDATA)>< !ELEMENT semester (#PCDATA)>< !ELEMENT desc (#PCDATA)>< ! ATTLIST course c id ID #REQUIRED>
Informal DescriptionA course catalog consists of zero or more of coursesA course consists of a name, a semester, and a description
It is identified by an ID that is required
A (course) name is a string (leaf in XML document)
A semester is a string (leaf in XML document)
A description is a string (leaf in XML document)
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 6 / 37
Overview
PurposeDefine the document structure
Legal elements and attributes
Serves the same purpose as a create table statement in SQLStructure and type of dataIntegrity constraints!
Left over from SGMLIs not written in XML
If this is a requirement then use XML Schema
Still very widely usedBecause much simpler than XML Schema
NoteMany simple errors can be found using a DTD
A necessity if receiving XML documents from external sources
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 7 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 8 / 37
Simplest EntityExample (Element Declaration)<!ELEMENT name (#PCDATA)>
Example (Allowed Values)<name>Hello Element</name>
<name/>
<name><![CDATA[ select ∗ from emp where sal > 10]]></name>
Example (Illegal Values)<name>> </name>
<name>></name>
<name><it>Hello</it></name>
Unknown element < it>, must be defined in DTD
NoteRoot, internal-node, and leafs in XML tree representation
Terminal and non-terminal in grammar terminology
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 9 / 37
Sequences of Child Elements
Example (Element Declaration)<!ELEMENT course (name, semester, desc)>
Example (Allowed XML Fragment, Why?)<course>
<name>OOP< / name><semester>7< / semester><desc> I n t r o d u c t i o n to OOP< / desc>
< / course>
Example (Disallowed XML Fragment, Why?)<course>
<semester>7< / semester><name>OOP< / name>
< / course>
Example (Is this allowed?)<course>< / course>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 10 / 37
Sequences of Child Elements
Example (Element Declaration)<!ELEMENT course (name, semester, desc)>
Example (Allowed XML Fragment, Why?)<course>
<name>OOP< / name><semester>7< / semester><desc> I n t r o d u c t i o n to OOP< / desc>
< / course>
Example (Disallowed XML Fragment, Why?)<course>
<semester>7< / semester><name>OOP< / name>
< / course>
Example (Is this allowed?)<course>< / course>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 10 / 37
Sequences of Child Elements
Example (Element Declaration)<!ELEMENT course (name, semester, desc)>
Example (Allowed XML Fragment, Why?)<course>
<name>OOP< / name><semester>7< / semester><desc> I n t r o d u c t i o n to OOP< / desc>
< / course>
Example (Disallowed XML Fragment, Why?)<course>
<semester>7< / semester><name>OOP< / name>
< / course>
Example (Is this allowed?)<course>< / course>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 10 / 37
Choice Among Child Elements
Example (Element Declaration)<!ELEMENT circle (x, y, (radius | diameter))>
Example (Allowed XML Fragment)< c i r c l e>
<x>5< / x><y>9< / y><diameter>7< / d iameter>
< / c i r c l e>
Example (Illegal XML Fragment)< c i r c l e>
<x>4< / x><y>8< / y>< rad ius>3.5< / rad ius><diameter>7< / d iameter>
< / c i r c l e>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 11 / 37
Symbols in a DTD
Symbols
Symbol Example
∗ <!ELEMENT coursecatalog (course)∗>
+ <!ELEMENT coursecatalog (course)+>
? <!ELEMENT coursecatalog (course)?>
, <!ELEMENT course (name, semester, desc) >
| <!ELEMENT course (name | semester | desc) >
NoteSymbols are mostly taken from regular expressions
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 12 / 37
Mixed Content
Example (Data Centric)< !ELEMENT coor ( x , y )>
Example (Allowed Fragment)<coor>
<x>5< / x><y>9< / y>
< / coor>
Example (Mixed Content)< !ELEMENT coor ( x , y , #PCDATA)∗>
Example (Allowed Fragment)<coor>
This i s the coord ina te(<x>5< / x> , <y>9< / y> ) wherethe t reasure i s hidden !
< / coor>
NoteData centric very table like
Mixed content also called narrative document
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 13 / 37
Element Declarations using ANY
Example (Any)< !ELEMENT coor (ANY)>< !ELEMENT x (#PCDATA)<!ELEMENT y (#PCDATA)
Example (Allowed Fragments)<coor/>
<coor>Hello World</coor>
<coor>Hello <x>1</x><x/>World<y>3</y><y>4</y></coor>
<coor>Hello <x>1</x><y>2</y>World<y>3</y><x>4</x></coor>
Example (Illegal Fragments)<coor><z>1</z></coor>
<coor><x>1</x><y>1<y/><z>1</z></coor>
NoteANY handy for narrative documents, e.g., HTML
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 14 / 37
Element Declarations using EMPTY
Example (Empty)< !ELEMENT coor EMPTY>
Example (Allowed?)<coor></coor>
<coor/>
<coor>Hello</coor>
<coor><x>Hello</x></coor>
<coor> </coor>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 15 / 37
Summary: ElementsRepetition
Symbol Explanation Example
? zero-or-one <!ELEMENT person (address?)>
* zero-or-more <!ELEMENT person (address∗)>
+ one-or-more <!ELEMENT person (address+)>
once <!ELEMENT person (address)>
Sequence or ChoiceSymbol Explanation Example
, Sequence <!ELEMENT coor (x, y)>
| Choice <!ELEMENT coor (x | y)>
Data TypeSymbol Explanation Example
#PCDATA String <!ELEMENT name (#PCDATA)>
ANY What ever <!ELEMENT coor (ANY)>
EMPTY Empty <!ELEMENT room EMPTY>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 16 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 17 / 37
Attribute Declarations
Example (Circles)<?xml vers ion= ’ 1.0 ’ encoding= ’ u t f −8 ’ ?>< !ELEMENT drawing ( c i r c l e )∗>< !ELEMENT c i r c l e ( x , y , ( rad ius | diameter ) )>< ! ATTLIST c i r c l e c id ID #REQUIRED
name CDATA #IMPLIED >< !ELEMENT x (#PCDATA)>< !ELEMENT y (#PCDATA)>
< !ELEMENT rad ius (#PCDATA)>< ! ATTLIST rad ius u n i t (mm| cm |m) ”m”> < !−− Enum wi th d e f a u l t −−>< !ELEMENT diameter (#PCDATA)>< ! ATTLIST diameter u n i t (mm| cm |m) #REQUIRED> < !−− Enum no d e f a u l t −−>
NoteMandatory and optional attributes
One or more attributes
Enumeration with defaults
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 18 / 37
Example Document
Example (Circles)<?xml vers ion= ” 1.0 ” encoding= ’UTF−8 ’ ?>< !DOCTYPE drawing SYSTEM ” c i r c l e a t t . dtd ”><drawing>
< c i r c l e c id = ’C1 ’ name= ’ f o r e s t ’><x>8< / x> <y>8< / y>< rad ius>4< / rad ius> < !−− d e f a u l t u n i t−−>
< / c i r c l e>< c i r c l e c id= ’C2 ’> < !−− name not requ i red −−>
<x>5< / x> <y>5< / y>< rad ius u n i t = ”cm”>4< / rad ius> < !−− e x p l i c i t u n i t−−>
< / c i r c l e>< / drawing>
NoteUnique value is not an integer
Used that attribute name is optional in element circle
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 19 / 37
Uniqueness, Examples
Example (Circle/Points with IDs)<?xml vers ion= ’ 1.0 ’ encoding= ’ u t f −8 ’ ?>< !ELEMENT drawing ( po in t | c i r c l e )∗>< !ELEMENT po in t ( x , y )>< !ELEMENT c i r c l e ( x , y , ( rad ius | diameter ) )>< ! ATTLIST c i r c l e d id ID #REQUIRED>< ! ATTLIST po in t d id ID #REQUIRED>
Example (Circles)<drawing>
< c i r c l e d id= ’C1 ’><x>8< / x> <y>8< / y>< rad ius>4< / rad ius>
< / c i r c l e><po in t d id= ’P2 ’>
<x>5< / x> <y>5< / y>< / po i n t>
< / drawing>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 20 / 37
Uniqueness, Errors
Example (Find the error 1!)<drawing>
< c i r c l e c id= ’C1 ’ name= ’ f o r e s t ’><x>8< / x> <y>8< / y>< rad ius>5< / rad ius>
< / c i r c l e>< c i r c l e c id= ’C1 ’>
<x>5< / x> <y>5< / y>< rad ius u n i t = ”cm”>8< / rad ius>
< / c i r c l e>< / drawing>
Example (Find the error 2!)<drawing>
< c i r c l e d id= ’C11 ’><x>8< / x> <y>8< / y>< rad ius>4< / rad ius>
< / c i r c l e><po in t d id= ’C11 ’>
<x>5< / x> <y>5< / y>< / po i n t>
< / drawing>
Example (Find the error 3!)<drawing>
< c i r c l e c id= ’C1 ’ name= ’ f o r e s t ’><x>8< / x> <y>8< / y>< rad ius>5< / rad ius>
< / c i r c l e>< c i r c l e c id= ’ 2C ’>
<x>5< / x> <y>5< / y>< rad ius u n i t = ”cm”>8< / rad ius>
< / c i r c l e>< / drawing>
Example (Find the error 4!)<drawing>
< c i r c l e d id= ’C11 ’><x>8< / x> <y>8< / y>< rad ius>4< / rad ius>
< / c i r c l e><po in t d id= ’ C1111111111111111111111111111 ’>
<x>5< / x> <y>5< / y>< / po i n t>
< / drawing>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 21 / 37
Uniqueness
LimitationsOnly attribute values unique not element values
Cannot be a integer, e.g., <circle did=’1’> not allowedOnly unique within a single document
Uniqueness not guaranteed across multiple documents
Only a single attribute uniqueness (no composite keys)Combination of x and y coordinates cannot be declared unique
NoteUniqueness quite restrictive compared to DBMS technology
XML Schema lifts most limitations on uniqueness
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 22 / 37
Empty Elements with Attributes
Example (Empty)< !ELEMENT coor EMPTY>< ! ATTLIST coor c id ID #REQUIRED
x CDATA #REQUIREDy CDATA #REQUIREDz CDATA #IMPLIED>
Example (Allowed?)<coor/>
<coor cid=’c1’ x=’1’ y=’1’ z=’1’ />
<coor cid=’c2’ x=’2’ y=’2’></coor>
<coor cid=’c3’ x=’3’ y=’3’> </coor>
<coor cid=’c4’ z=’4’ y=’4’ x=’4’ />
<coor z=’5’ y=’5’ x=’5’ />
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 23 / 37
Is something Wrong?
Example (Case 1)< !ELEMENT coor EMPTY>< ! ATTLIST coor
c id IDx CDATA #REQUIRED>
Example (Case 2!)< !ELEMENT coor EMPTY>< ! ATTLIST coor
c id ID #IMPLIEDx CDATA #REQUIRED>
Example (Case 3!)< !ELEMENT coor EMPTY>< ! ATTLIST coor
x CDATA #REQUIREDc id ID #REQUIRED>
Example (Case 4)< !ELEMENT coor EMPTY>< ! ATTLIST coor
c id ID ’ 42 ’x CDATA #REQUIRED>
Example (Case 5)< !ELEMENT coor (EMPTY)>< ! ATTLIST coor
c id ID #REQUIREDx CDATA #REQUIRED>
Example (Case 6)< !ELEMENT coor EMPTY>< ! ATTLIST coor
c id ID #REQUIREDx ID #REQUIRED>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 24 / 37
Summary: AttributesGeneral Syntax
<!ATTLIST element−name attribute−name type [DefaultValue]>
Often used types
Type Example
CDATA <!ATTLIST course id CDATA>
ID <!ATTLIST course id ID #REQUIRED>
Enumeration <!ATTLIST course id (OOP | DB)>
Defaults
Type Example
#REQUIRED <!ATTLIST course id ID #REQUIRED>
#IMPLIED <!ATTLIST course id CDATA #IMPLIED>
#FIXED <!ATTLIST course id CDATA #FIXED ”1”>
A value <!ATTLIST course id (OOP | DB) ”DB”>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 25 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 26 / 37
A Buggy DTD
Example (DTD With Five Errors)<?xml vers ion= ’ 1.0 ’>< !ELEMENT users user+>< !ELEMENT user ( f i rs tname , lastname>< !ELEMENT f i r s tname (#PCDATA)>< !ELEMENT lastname>
Two-Minutes ExerciseWith your neighbor identify the errors in the DTD
Example (The Corrected DTD)<?xml vers ion= ’ 1.0 ’ encoding= ’ u t f −8 ’ ?>< !ELEMENT users ( user )+>< !ELEMENT user ( f i rs tname , lastname )>< !ELEMENT f i r s tname (#PCDATA)>< !ELEMENT lastname (#PCDATA)>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 27 / 37
A Buggy DTD
Example (DTD With Five Errors)<?xml vers ion= ’ 1.0 ’>< !ELEMENT users user+>< !ELEMENT user ( f i rs tname , lastname>< !ELEMENT f i r s tname (#PCDATA)>< !ELEMENT lastname>
Two-Minutes ExerciseWith your neighbor identify the errors in the DTD
Example (The Corrected DTD)<?xml vers ion= ’ 1.0 ’ encoding= ’ u t f −8 ’ ?>< !ELEMENT users ( user )+>< !ELEMENT user ( f i rs tname , lastname )>< !ELEMENT f i r s tname (#PCDATA)>< !ELEMENT lastname (#PCDATA)>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 27 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 28 / 37
Uncertain About Content
Example (DTD for Courses with Flexible Description)<?xml vers ion= ” 1.0 ” encoding= ”UTF−8” ?>< !ELEMENT courses ( course )∗>< !ELEMENT course (name, desc )>< !ELEMENT name (#PCDATA)>< !ELEMENT desc ANY>
Example (DTD for Courses with Flexible Description)<?xml vers ion= ” 1.0 ” encoding= ”UTF−8” ?>< !DOCTYPE courses SYSTEM ” course . dtd ”><courses>
<course><name>OOP< / name><desc><name>ob jec t−or ien ted< / name><desc>programming< / desc> .
< / desc>< / course>
< / courses>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 29 / 37
A University Example, Setup
Example (DTD)<?xml vers ion= ” 1.0 ” encoding= ”UTF−8” ?>< !ELEMENT u n i v e r s i t y ( courses ,
students ,f o l l o w s )>
< !ELEMENT courses ( course )+>< !ELEMENT course (name)>< ! ATTLIST course c id ID #REQUIRED>< !ELEMENT name (#PCDATA)>
< !ELEMENT students ( s tudent )+>< !ELEMENT student ( fname )>< ! ATTLIST student s id ID #REQUIRED>< !ELEMENT fname (#PCDATA)>
< !ELEMENT f o l l o w s ( takes )+>< !ELEMENT takes EMPTY>< ! ATTLIST takes s id IDREF #REQUIRED>< ! ATTLIST takes c ids IDREFS #REQUIRED>
Example (XML Fragment)<u n i v e r s i t y><courses><course c id= ’C111 ’><name>DB< / name>
< / course><course c id= ’C222 ’><name>OOP< / name>
< / course>< / courses><students><student s id= ’S11 ’><fname>Ann< / fname>
< / s tudent><student s id= ’S22 ’><fname>Bar t< / fname>
< / s tudent><student s id= ’S33 ’><fname>Curt< / fname>
< / s tudent>< / s tudents>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 30 / 37
A University Example, Referencing
Example (DTD)<?xml vers ion= ” 1.0 ” encoding= ”UTF−8” ?>< !ELEMENT u n i v e r s i t y ( courses ,
students ,f o l l o w s )>
< !ELEMENT courses ( course )+>< !ELEMENT course (name)>< ! ATTLIST course c id ID #REQUIRED>< !ELEMENT name (#PCDATA)>
< !ELEMENT students ( s tudent )+>< !ELEMENT student ( fname )>< ! ATTLIST student s id ID #REQUIRED>< !ELEMENT fname (#PCDATA)>
< !ELEMENT f o l l o w s ( takes )+>< !ELEMENT takes EMPTY>< ! ATTLIST takes s id IDREF #REQUIRED>< ! ATTLIST takes c ids IDREFS #REQUIRED>
Example (XML Fragment)< f o l l o w s><takes s id= ’S11 ’ c ids= ’C111 C222 ’ /><takes s id= ’S22 ’ c ids= ’C222 ’ /><takes s id= ’S33 ’ c ids= ’C111 ’ />
< / f o l l o w s>
NoteID cannot start with digit
sid is a single ID
cids is a set of IDs
No overlap between IDs
Separator is space (not ,)
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 31 / 37
Quiz: IDREFS
Example (University XML)<u n i v e r s i t y><courses><course c id= ’C111 ’><name>DB< / name>
< / course><course c id= ’C222 ’><name>OOP< / name>
< / course>< / courses><students><student s id= ’S11 ’><fname>Ann< / fname>
< / s tudent><student s id= ’S22 ’><fname>Bar t< / fname>
< / s tudent><student s id= ’S33 ’><fname>Curt< / fname>
< / s tudent>< / s tudents>
Example (Allowed One?)< f o l l o w s><takes s id= ’S11 ’ c ids= ’C111 C222 C111 ’ />
< / f o l l o w s>
Example (Allowed Two?)< f o l l o w s><takes s id= ’S11 ’ c ids= ’C333 C222 C111 ’ />
< / f o l l o w s>
Example (Allowed Three?)< f o l l o w s><takes s id= ’S11 ’ c ids= ’C111 ’ /><takes s id= ’S11 ’ c ids= ’C222 ’ />
< / f o l l o w s>
Example (Allowed Four?)< f o l l o w s><takes s id= ’S11 ’ c ids= ’ ’ /><takes s id= ’S22 ’ c ids= ’ c111 ’ />
< / f o l l o w s>
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 32 / 37
Using an Internal DTDExample (DTD for Courses with Flexible Description)<?xml vers ion= ” 1.0 ” standalone= ” yes ” ?>< !DOCTYPE courses [<!ELEMENT courses ( course )∗>< !ELEMENT course (name, desc )>< !ELEMENT name (#PCDATA)>< !ELEMENT desc ANY>]><courses>
<course><name>OOP< / name><desc><name>ob jec t−or ien ted< / name><desc>programming< / desc> .
< / desc>< / course>
< / courses>
NoteBenefit: All information in one file
Drawback: DTD is not reused (maintenance nightmare)
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 33 / 37
Outline
1 Introduction
2 Elements
3 Attributes
4 DTD Find Errors
5 Putting it All Together
6 Summary
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 34 / 37
Summary: DTD
LimitationsOnly very basic data types supported
Only single-column keys (for uniqueness)
Uniqueness only guaranteed within a single document
Very limited support for integrity constraints
NoteDTD is widely used
DTD is being replaced by XML Schema when documents are complex
There are problems using XML Namespace and DTD
AdviseNever build a new DTD if an existing (standard) can be used!
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 35 / 37
RDBMS vs. XML
RDBMS vs. XML
Query SchemaSQL DML DDLXML XQuery DTD/XML Schema
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 36 / 37
Summary: DTD versus XML Schema
DTDOwn format
Compact notation
Simple data types
From SGML
Support entities
No support namespaces
XML SchemaXML format
Very verbose
Advanced data types
Invented for XML
Does not support entities
Support namespaces
AdviceStart with a DTD
Move on to XML Schema for later iterations
Kristian Torp (Aalborg University) Introduction to DTD November 3, 2015 37 / 37