1 dtd (document type definition) imposing structure on xml documents (w3schools on dtds)w3schools on...

34
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs )

Upload: rocco-downen

Post on 14-Dec-2015

239 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

1

DTD(Document Type

Definition)Imposing Structure on

XML Documents(W3Schools on DTDs)

Page 2: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

2

Motivation

• A DTD adds syntactical requirements in addition to the well-formed requirement

• It helps in eliminating errors when creating or editing XML documents

• It clarifies the intended semantics• It simplifies the processing of XML

documents

Page 3: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

3

An Example

• In an address book, where can a phone number appear?– Under <person>, under <name> or

under both?

• If we have to check for all possibilities, processing takes longer and it may not be clear to whom a phone belongs

Page 4: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

4

Document Type Definitions

• Document Type Definitions (DTDs) impose structure on XML documents

• There is some relationship between a DTD and a schema, but it is not close – hence the need for additional “typing” systems (XML schemas)

• The DTD is a syntactic specification

Page 5: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

5

Example: An Address Book<person>

<name> Homer Simpson </name>

<greet> Dr. H. Simpson </greet>

<addr>1234 Springwater Road </addr>

<addr> Springfield USA, 98765 </addr>

<tel> (321) 786 2543 </tel>

<fax> (321) 786 2544 </fax>

<tel> (321) 786 2544 </tel>

<email> [email protected] </email>

</person>

Mixed telephones and faxes

As manyas needed

As many address lines as needed (in order)

At most one greeting

Exactly one name

Page 6: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

6

Specifying the Structure

• name to specify a name element

• greet? to specify an optional (0 or 1) greet

elements

• name, greet? to specify a name followed by an optional greet

Page 7: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

7

Specifying the Structure (cont’d)

• addr* to specify 0 or more address lines

• tel | fax a tel or a fax element

• (tel | fax)* 0 or more repeats of tel or fax

• email* 0 or more email elements

Page 8: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

8

Specifying the Structure (cont’d)

• So the whole structure of a person entry is specified by

name, greet?, addr*, (tel | fax)*, email*

• This is known as a regular expression

Page 9: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

9

Element Type Definition

• for each element type E, a declaration of the form:

• <!ELEMENT E P>• where P is a regular expression, i.e.,• P ::= EMPTY | ANY | #PCDATA | E’ | • P1, P2 | P1 | P2 | P? | P+

| P* – E’: element type– P1 , P2: concatenation– P1 | P2: disjunction – P?: optional– P+: one or more occurrences– P*: the Kleene closure

Page 10: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

10

Summary of Regular Expressions

• A The tag (i.e., element) A occurs• e1,e2 The expression e1 followed

by e2• e* 0 or more occurrences of e• e? Optional: 0 or 1 occurrences• e+ 1 or more occurrences• e1 | e2 either e1 or e2• (e) grouping

Page 11: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

11

The Definition of an Element Consists of Exactly One of the

Following• A regular expression (as defined

earlier)• EMPTY means that the element

has no content• ANY means that content can be

any mixture of PCDATA and elements defined in the DTD

• Mixed content which is defined as described on the next slide

• (#PCDATA)

Page 12: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

12

The Definition of Mixed Content

• Mixed content is described by a repeatable OR group (#PCDATA | element-name | …)*– Inside the group, no regular

expressions – just element names– #PCDATA must be first followed by 0

or more element names, separated by |

– The group can be repeated 0 or more times

Page 13: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

13

An Address-Book XML Document with an Internal DTD

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE addressbook [ <!ELEMENT addressbook (person*)> <!ELEMENT person (name, greet?, address*, (fax | tel)*, email*)> <!ELEMENT name (#PCDATA)> <!ELEMENT greet (#PCDATA)> <!ELEMENT address(#PCDATA)> <!ELEMENT tel (#PCDATA)> <!ELEMENT fax (#PCDATA)> <!ELEMENT email (#PCDATA)>]>

The name ofthe DTD is

addressbook

“Internal” means that the DTD and theXML Document are in the same file

The syntax of a DTD is not XML syntax

Page 14: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

14

The Rest of theAddress-Book XML Document

<addressbook> <person> <name> Jeff Cohen </name> <greet> Dr. Cohen </greet>

<email> [email protected] </email> </person></addressbook>

Page 15: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

15

Regular Expressions

• Each regular expression determines a corresponding finite-state automaton• Let’s start with a simpler example:

name, addr*, email

name

addr

email

This suggests a simple parsing program

A double circle denotes an accepting state

Page 16: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

16

Another Examplename,address*,(tel | fax)*,email*

name

address

tel

tel

fax

fax

email

email

email

Page 17: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

17

Some Things are Hard to Specify

Each employee element should contain name, age and ssn elements in some order

<!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) |

(ssn, name, age) | ... )>

Suppose that there were many more fields!

Page 18: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

18

Some Things are Hard to Specify (cont’d)

<!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) |

(ssn, name, age) | ... )>

Suppose there were many more fields!There are n! differentorders of n elements

It is not even polynomial

Page 19: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

19

Specifying Attributes in the DTD

<!ELEMENT height (#PCDATA)><!ATTLIST height dimension CDATA #REQUIRED accuracy CDATA #IMPLIED >

The dimension attribute is required The accuracy attribute is optional

CDATA is the “type” of the attribute – it means “character data,” and may take any literal string as a value

Page 20: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

20

The Format of an Attribute Definition

• <!ATTLIST element-name attr-name attr-type default-value>

• The default value is given inside quotes

• attribute types: – CDATA – ID, IDREF, IDREFS– …

Page 21: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

21

Summary of AttributeDefault Values

• #REQUIRED means that the attribute must by included in the element

• #IMPLIED• #FIXED “value”

– The given value (inside quotes) is the only possible one

• “value”– The default value of the attribute if none is

given

Page 22: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

22

Recursive DTDs<DOCTYPE genealogy [

<!ELEMENT genealogy (person*)><!ELEMENT person (

name,dateOfBirth,person, -- motherperson )> -- father

... ]>

What is the problem with this?A parser does not notice it!

Each person should have a father and amother. Thisleads to eitherinfinite data ora person thatis a descendentof herself.

Page 23: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

23

Recursive DTDs (cont’d)<DOCTYPE genealogy [

<!ELEMENT genealogy (person*)><!ELEMENT person (

name,dateOfBirth,person?, -- motherperson? )> -- father

... ]>

What is now the problem with this?

If a person only has a father, how can you tell that he has a father anddoes not havea mother?

Page 24: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

24

Using ID and IDREF Attributes

<!DOCTYPE family [ <!ELEMENT family (person)*> <!ELEMENT person (name)> <!ELEMENT name (#PCDATA)> <!ATTLIST person

id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED>]>

Page 25: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

25

IDs and IDREFs• ID attribute: unique within the entire document.

– An element can have at most one ID attribute. – No default (fixed default) value is allowed.

• #required: a value must be provided• #implied: a value is optional

• IDREF attribute: its value must be some other element’s ID value in the document.

• IDREFS attribute: its value is a set, each element of the set is the ID value of some other element in the document.<person id=“898” father=“332” mother=“336”

children=“982 984 986”>

Page 26: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

26

Some Conforming Data<family> <person id=“lisa” mother=“marge” father=“homer”> <name> Lisa Simpson </name> </person>

<person id=“bart” mother=“marge” father=“homer”> <name> Bart Simpson </name> </person> <person id=“marge” children=“bart lisa”> <name> Marge Simpson </name> </person> <person id=“homer” children=“bart lisa”> <name> Homer Simpson </name> </person></family>

Page 27: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

27

ID References do not Have Types

• The attributes mother and father are references to IDs of other elements

• However, those are not necessarily person elements!

• The mother attribute is not necessarily a reference to a female person

Page 28: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

28

An Alternative Specification

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE family [

<!ELEMENT family (person)*><!ELEMENT person (name, mother?, father?, children?)><!ATTLIST person id ID #REQUIRED><!ELEMENT name (#PCDATA)><!ELEMENT mother EMPTY><!ATTLIST mother idref IDREF #REQUIRED><!ELEMENT father EMPTY><!ATTLIST father idref IDREF #REQUIRED><!ELEMENT children EMPTY><!ATTLIST children idrefs IDREFS #REQUIRED>

]>

Page 29: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

29

The Revised Data<family>

<person id="marge"> <name> Marge Simpson </name> <children idrefs="bart lisa"/>

</person><person id="homer"> <name> Homer Simpson </name> <children idrefs="bart lisa"/></person>

<person id="bart"> <name> Bart Simpson </name>

<mother idref="marge"/> <father idref="homer"/>

</person><person id="lisa"> <name> Lisa Simpson </name> <mother idref="marge"/>

<father idref="homer"/></person>

</family>

Page 30: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

30

Consistency of ID and IDREF Attribute Values

•If an attribute is declared as ID– The associated value must be distinct, i.e.,

different elements (in the given document) must have different values for the ID attribute (no confusion)

• Even if the two elements have different element names

•If an attribute is declared as IDREF– The associated value must exist as the value of

some ID attribute (no dangling “pointers”)

•Similarly for all the values of an IDREFS attribute

•ID, IDREF and IDREFS attributes are not typed

Page 31: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

31

Adding a DTD to the Document

• A DTD can be internal– The DTD is part of the document file

• or external– The DTD and the document are on

separate files– An external DTD may reside

•In the local file system (where the document is)

•In a remote file system

Page 32: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

32

Connecting a Document with its DTD

• An internal DTD:<?xml version="1.0"?>

<!DOCTYPE db [<!ELEMENT ...> … ]><db> ... </db>

• A DTD from the local file system: <!DOCTYPE db SYSTEM "schema.dtd">

• A DTD from a remote file system: <!DOCTYPE db SYSTEM "http://www.schemaauthority.com/schema.dtd">

Page 33: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

33

Well-Formed XML Documents

• An XML document (with or without a DTD) is well-formed if– Tags are syntactically correct

– Every tag has an end tag

– Tags are properly nested

– There is a root tag

– A start tag does not have two occurrences of the same attribute

An XML document must be well formed

Page 34: 1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs

34

Valid Documents

• A well-formed XML document isvalid if it conforms to its DTD, that is,– The document conforms to the regular-

expression grammar,

– The types of attributes are correct, and

– The constraints on references are satisfied