xmldtd transparency no. 1 xml document type definitions (dtds)

86
XMLDTD Transparency No. 1 XML Document Type Definitions (DTD s)

Post on 20-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XMLDTD

Transparency No. 1

XML Document Type Definitions (DTDs)

Page 2: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 2

DTD - Table of Contents

Introduction to DTD An introduction to the XML Document Type Definition.

DTD - XML Building Blocks What XML building blocks are defined in a DTD. DTD Elements How to define the elements of an XML document

using DTD. DTD Attributes How to define the legal attributes of XML elements

using DTD. DTD Entities How to define XML entities using DTD.

Page 3: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 3

Introduction to DTD

The purpose of a DTD is to define the legal building blocks of an XML document.

It defines the document structure with a list of legal elements.

A DTD can be declared inline in your XML document, or as an external reference.

Page 4: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 4

Internal DTD

This is an XML document with a Document Type Definition: <?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Tove</to> <from>Jani</from> <heading>Remind

er</heading> <body>Don't forget me this weekend!</body> </note>

The DTD is interpreted like this:!ELEMENT note (in line 2) defines the element "note" as having four elements: "to,from,heading,body". and so on.....

Page 5: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 5

External DTD

This is the same XML document with an external DTD: 

<?xml version="1.0"?><!DOCTYPE note SYSTEM "note.dtd"><note> <to>Tove</to> <from>Jani</from><heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>

Page 6: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 6

note.dtd

This is a copy of the file "note.dtd" containing the Document Type Definition:

<?xml version="1.0"?> <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)>

Page 7: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 7

Why use a DTD?

XML provides an application independent way of sharing data.

With a DTD, independent groups of people can agree to use a common DTD for interchanging data.

Your application can use a standard DTD to verify that data that you receive from the outside world is valid. You can also use a DTD to verify your own data.

Page 8: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 8

2.8 Document Type Declaration (cont’d)

Document Type Definition

[28] doctypedecl ::= '<!DOCTYPE' S Name ( S ExternalID)?

S? ('[' (markupdecl | DeclSep)* ']' S?)? '>'

[ VC: Root Element Type ]

[28a] DeclSep   ::=   PEReference | S[29] markupdecl ::= elementdecl | AttlistDecl | EntityDe

cl

| NotationDecl | PI | Comment

[ VC: Proper Declaration/PE Nesting ]

[ WFC: PEs in Internal Subset ]

Page 9: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 9

WFC: PEs in Internal Subset

Well-formedness constraint: PEs in Internal Subset In the internal DTD subset, parameter-entity references ca

n occur only where markup declarations can occur, not within markup declarations.

(This does not apply to references that occur in external parameter entities or to the external subset.)

Ex: the following is not well-formed! <?xml version="1.0" ?> <!DOCTYPE test SYSTEM "test.dtd" [ <!ENTITY WhatISaid "I said %YN;"> ]> <test/>

PE reference cannot appear here

In internal subset even YN is defined in test.dtd

Page 10: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 10

2.8 External Subset

Portions of the contents of the external subset or of external parameter entities may conditionally be ignored by using the conditional section construct; this is not allowed in the internal subset.

External Subset

[30] extSubset ::= TextDecl? extSubsetDecl

[31] extSubsetDecl ::= ( markupdecl | conditionalSect | DeclSep )*

Page 11: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 11

2.8 Example XML documents

An example of an XML document with a document type declaration:

<?xml version="1.0"?>

<!DOCTYPE greeting SYSTEM "hello.dtd">

<greeting>Hello, world!</greeting>The system identifier "hello.dtd" gives the URI of a DT

D for the document. The declarations can also be given locally, as in this e

xample:

<?xml version="1.0" encoding="UTF-8" ?>

<!DOCTYPE greeting [

<!ELEMENT greeting (#PCDATA)> ]>

<greeting>Hello, world!</greeting>

Page 12: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 12

2.9 Standalone Document Declaration

Standalone Document Declaration [32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no')

"'") | ('"' ('yes' | 'no') '"')) [ VC: Standalone Document Declaration ]

Example: <?xml version="1.0" standalone='yes'?>

Page 13: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 13

2.10 White Space and End-of_line Handling

White Space: special attribute xml:space used to indicate if (markup) s

paces should be preserved. <!ATTLIST poem xml:space (default | preserve) 'preserv

e'>

<e1 v1=“abc” v2 =“def” /> <e1 v1=“abc” v2 =“def”/>

Normalize End-of-line: #xD#xA --> #xA #D --> #xA before parsing

Page 14: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 14

2.12 Language Identification

A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document.

In valid documents, this attribute, like any other, must be declared if it is used.

The values of the attribute are language identifiers as defined by [IETF RFC 1766], "Tags for the Identification of Languages”.

Example:

xml:lang NMTOKEN #IMPLIED

<!ATTLIST poem xml:lang NMTOKEN 'fr'>

<!ATTLIST gloss xml:lang NMTOKEN 'en'>

<!ATTLIST note xml:lang NMTOKEN 'en'>

Page 15: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 15

2.12 Language Identifications

<p xml:lang="en">The quick brown fox jumps over the lazy dog.</p>

<p xml:lang="en-GB">What colour is it?</p>

<p xml:lang="en-US">What color is it?</p>

<sp who="Faust" desc='leise' xml:lang="de">

<l>Habe nun, ach! Philosophie,</l>

<l>Juristerei, und Medizin</l>

<l>und leider auch Theologie</l>

<l>durchaus studiert mit hei 絽 m Bem 'n.</l>

</sp>

Page 16: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 16

DTD - XML building blocks

The building blocks of XML documentsXML documents (and HTML documents) are made

up by the following building blocks: Elements, Tags, Attributes, Entities, PCDATA, and CDATA sections

This is a brief explanation of each of the building blocks:

Page 17: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 17

Elements

Elements are the main building blocks of both XML and HTML documents.

Examples of HTML elements are "body" and "table". Examples of XML elements could be "note" and "m

essage".Elements can contain text, other elements, or be em

pty.Examples of empty HTML elements are "hr", "br" an

d "img".

Page 18: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 18

Tags

Tags are used to markup elements.A starting tag like <element_name> mark up the

beginning of an element, and an ending tag like </element_name>  mark up the

end of  an element.Examples:

A body element:

<body> body text in between</body>.A message element:

<message> some message in between</message>

Page 19: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 19

Attributes

Attributes provide extra information about elements.Attributes are placed inside the start tag of an elem

ent.Attributes come in name/value pairs. The following "img" element has an additional infor

mation about a source file:

<img src="computer.gif" />Notes:

The name of the element is "img". The name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is empty it is closed by a " /".

Page 20: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 20

PCDATA

PCDATA means parsed character data.Think of character data as the text found between th

e start tag and the end tag of an XML element.PCDATA is text that will be parsed by a parser.

Tags inside the text will be treated as markup and entities will be expanded, hence they should not appear pcdata. 

Ex: <!ELEMENT section (#PCDATA)><section> abc <em> de </section>

Page 21: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 21

CDATA sections

CDATA also means character data.CDATA is text that will NOT be parsed by a parser.

Tags inside the text will NOT be treated as markup and entities will not be expanded.

Ex:

<section>abc <![CDATA[ &a; <em>def]]> g</section>

Page 22: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 22

Entities

Entities are used to define common text like macros. Entity references are references to entities.Most of you will know the HTML entity reference: "&nbsp;"  t

hat is used to insert an extra space in an HTML document. Entities are expanded when a document is parsed by an XML

parser.The following entities are predefined in XML:Entity References Character

&lt; <

&gt; >

&amp; &

&quot; "

&apos; '

Page 23: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 23

DTD - Elements

Declaring an ElementIn the DTD, XML elements are declared with an elem

ent declaration. An element declaration has the following syntax:

<!ELEMENT element-name element-content>

Types of element contents: EMPTY – no contents ANY -- no restriction on contents MIXED -- allow character data (character data only) or (character data + elements) ELEMENTs-ONLY -- allow elements only

Page 24: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 24

EMPTY elements

Elements with empty contentDeclared with the keyword EMPTY:

<!ELEMENT element-name EMPTY>

Example:<!ELEMENT img EMPTY>

Legal Instances: <img/> <img></img><img> </img>

Page 25: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 25

ANY Elements

Elements that can contain any combination of elements and text data.

Declared with the ‘ANY’ keyword<!ELEMENT name ANY >Example:<!ELEMENT E1 ANY>Legal instances:

<E1> <E2/> e2 <E3> fff </E3> … </E1> <E1> dddd <E1> <E1/>

Page 26: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 26

Elements with MIXED contents

Elements that can only contain text contents <!ELEMENT name (#PCDATA)> Elements allowing text as well as element contents <!ELEMENT E0 (#PCDATA | E1 | E2 … )* > Example: <!ELEMENT note (#PCDATA)>

<!ELEMENT em EMPTY> <!ELEMENT e1 (#PCDATA | note | em)* > Instances:

<e1> ddd <em/> cd <note>ttt</note> <em/> </e1>

Page 27: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 27

Elements that can contains element contents only

Issue: how to declare the possible sequences of content elements occurrences.

Solu: regular expressions over element namesDefinition:CP ::= (name | choice | seq ) (‘+’ | ‘*’ | ‘?’ )?choice ::= a list of two or more CPs separated

by ‘|’ and is enclosed by ‘(‘ and ‘)’.seq ::= a list of one or more CPs seprated by ‘,’ an

d is enclosed by ‘(‘ and ‘)’Element-Only elements:<!ELEMENT name CP – name (‘+’ | ‘*’ | ‘?’ )? >

Page 28: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 28

Recursive definition of CP, seq and choice:

Basis: if is a name, then , ?, +, * are CPs (content particl

e).closure:

if is an seq or a choice, then , ?, +, * are CPs. if 1, 2,… n (n > 1) are CPs, then (1 | 2 | … | n) is a choic

e. if 1, 2,… n (n > 0) are CPs, then (1 , 2 , … , n) is an seq.

is a children if is a CP but is not a name optionally followed by +, ? or *.

Examples of children: Illegal : <!ELEMENT e1 e2*>, <!ELEMENT e1 e2> Legal : <!ELEMENT e1 (e2)>,<… (e2+)>, <… (e2)?>

Page 29: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 29

More examples

<!ELEMENT note (to,from,heading,body)> <!ELEMENT note

(to, from, heading1 | heading2, body)> (X)

<!ELEMENT note (to, from, (heading1 | heading2), body)>

(0)<!ELEMENT E1 ( (E1, E2) | (E1, E3, E2)) > (x, 1-

ambiguous)Rewritten as

… (E1, (E2 | (E3,E2)))> (0)

Page 30: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 30

3.2 (cont’d)

Element Type Declaration

[45] elementdecl ::= '<!ELEMENT' S Name S

contentspec S? '>'

[ VC: Unique Element Type Declaration]

[46] contentspec ::= ‘EMPTY’ | ‘ANY’ | Mixed | children

Examples: <!ELEMENT br EMPTY> <!ELEMENT p (#PCDATA|emph)* > <!ELEMENT %name.para; %content.para; > <!ELEMENT container ANY>

Page 31: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 31

3.2.1 Element Content

An element type has element content when elements of that type must contain only child elements (no character data), optionally separated by white space (characters matching the nonterminal S).

In this case, the constraint includes a content model, a simple grammar governing the allowed types of the child elements and the order in which they are allowed to appear.

The grammar is built on content particles (cps), which consist of names, choice lists of content particles, or sequence lists of content particles:

Page 32: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 32

3.2.1 (cont’d)

Element-content Models[47] children ::= (choice | seq) ('?' | '*' | '+')?[48] cp ::= (Name | choice | seq) ('?' | '*' | '+')?[49] choice ::= '(' S? cp ( S? '|' S? cp )+ S? ')'[50] seq ::= '(' S? cp ( S? ',' S? cp )* S? ')'where each Name is the type of an element which m

ay appear as a child.Examples:<!ELEMENT spec (front, body, back?)><!ELEMENT div1 (head, (p | list | note)*, div2*)><!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*>Note: (x) <!ELEMENT spec body> (0) <!ELEMENT spec (body)>

Page 33: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 33

3.2.2 Mixed Content

Mixed-content Declaration

[51] Mixed ::= '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*'

| '(' S? '#PCDATA' S? ')'

Examples: <!ELEMENT p (#PCDATA|a|ul|b|i|em)*> <!ELEMENT p (#PCDATA | %font; | %phrase; | %special; |

%form;)* > <!ELEMENT b (#PCDATA)>

Page 34: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 34

Attribute Definition

Defined for the elements they belong to <!ELEMENT book (preface, toc, chapter+, index?) <!ATTLIST book title CDATA #REQUIRED> <!ATTLIST book isbn CDATA #IMPLIED>

Or <!ATTLIST book title CDATA #REQUIRED isbn CDATA #IMPLIED >

Format:<!ATTLIST elm-name (attr-name attr-type attr-default-value)+

> Atributes have a name, a type, a default-value and belong to an eleme

nt.

Page 35: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 35

Attribute types

type ExplanationCDATA The value is character data(eval|eval|..) The value must be an enumerated valu

eID The value is an unique id IDREF The value is the id of another elementIDREFS The value is a list of other idsNMTOKENThe value is a valid XML name tokenNMTOKENS The value is a list of valid XML name tokensENTITY The value is an entity ENTITIES The value is a list of entitiesNOTATION The value is a name of a notation

Page 36: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 36

Attribute-default value

Value Explanation“v1” The attribute has a default

value “v1”#REQUIRED The attribute value must be

included in the element#IMPLIED The attribute does not have

to be included#FIXED “value”The attribute value is fixed

Page 37: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 37

Attribute Examples

DTD example: <!ELEMENT square EMPTY> <!ATTLIST square width CDATA "0"> XML example: <square width="100"></square>

Default attribute value Syntax: <!ATTLIST elm-name attribute-name CDATA "default-va

lue"> DTD example: <!ATTLIST payment type CDATA "check"> XML example: <payment type="check"> equ.to. <payment >

Page 38: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 38

Implied attribute

Syntax: <!ATTLIST elm-name attribute-name

attribute-type #IMPLIED>example:

<!ATTLIST contact fax CDATA #IMPLIED>

instance: <contact fax="555-667788">

Page 39: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 39

Required attributeSyntax: <!ATTLIST elm-name attr-name attr-type #REQUIRE

D>DTD example: <!ATTLIST person number CDATA #REQUIRED> XML example: <person number="5677"> <person> (x)

Page 40: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 40

Fixed attribute valueSyntax: <!ATTLIST elm-name attr-name attr-type #FIXED "va

lue">

DTD example: <!ATTLIST sender company CDATA #FIXED "Microsof

t">

XML example: <sender company="Microsoft"> equ.to <sender>

Page 41: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 41

Enumerated attribute valuesSyntax: <!ATTLIST elm-name attr-name (v1|v2|..) def-valu

e>DTD example: <!ATTLIST payment type (check|cash) "cash"> <!ATTLIST light color (red | green |yellow) #IMPLIED>XML example: <payment type="check"> or <payment type="cas

h"><light color=‘red’> or <light>

Page 42: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 42

3.3 Attribute-List Declarations

Attribute-list Declaration

[52] AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '>'

[53] AttDef ::= S Name S AttType S DefaultDecl

XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types.

Page 43: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 43

3.3.1 Attribute Types

Attribute Types

[54] AttType ::= StringType | TokenizedType |

EnumeratedType

[55] StringType ::= 'CDATA'

[56] TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS’

| 'ENTITY’ | 'ENTITIES' | 'NMTOKEN’ | 'NMTOKENS’ ID, IDREF and IDREFS for cross references ENTITY for referring to external unparsed objects NMTOKEN restrict attvalue to be a Nmtoken.

Page 44: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 44

3.3.1 (cont’d) Example of Entity type usage

<! DOCTYPE BookCategory [...

<!ATTLIST BOOK cover ENTITY #REQUIRED>

<!NOTATION PDF SYSTEM “http://www.adobe.com/…”>

<!ENTITY cover1 SYSTEM

http://www.host/thebookCover.pdf NDATA PDF>

]> ...

<BookCategory>

<BOOK cover=“cover1”>

… </BOOK>

Page 45: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 45

3.3.1 Enumerated Attribute Types

Enumerated Attribute Types

[57] EnumeratedType ::= NotationType | Enumeration

[58] NotationType ::= 'NOTATION' S

'(' S? Name (S? '|' S? Name)* S? ')'

[59] Enumeration ::= '(' S? Nmtoken

(S? '|' S? Nmtoken)* S? ')'

A NOTATION attribute identifies a notation, declare

d in the DTD with associated system and/or public identifiers, to be used in interpreting the element to which the attribute is attached.

Page 46: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 46

3.3.2 Attribute Defaults

An attribute declaration provides information on whether the attribute's presence is required, and if not, how an XML processor should react if a declared attribute is absent in a document.

Attribute Defaults[60] DefaultDecl ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue)Examples:<!ATTLIST termdef id ID #REQUIRED name CDATA #IMPLIED> <!ATTLIST list type (bullets|ordered|glossary) "ordered"> <!ATTLIST file format NOTATION (ps | pdf | word ) #REQUIRED <!ATTLIST form method CDATA #FIXED "POST">

Page 47: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 47

3.3.3 Attribute-value normalization

When: after end-of-line processing but before passed to app. 0. End-of-line processing ( &#XA; &#XD; &#XD &#XA &#xA)

Steps: initially nv=“” // normalized value1. Repeat until end of input.

character reference => append the referenced character to the normalized value

entity reference => recursively apply step 1 to the replacement text of the entity.

white space character (#x20, #xD, #xA, #x9) => append a space character (#x20) to the normalized value.

O/w (other character ) =>append the character to the normalized value.

2. If not CDATA type => removing leading/trailing spaces and replace sequences of space (#x20) characters by a single space (#x20) character

Notes : 1. char and entity references are not treated equal.

2. White spaces are normalized to space.

Page 48: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 48

Examples

<!ENTITY d "&#xD;"> <!ENTITY a "&#xA;"> <!ENTITY da "&#xD;&#xA;">

Attribute specification

a is NMTOKENS A is CDATA

a=“

xyz”

xyz #x20 #x20 x y z

a="&d;&d;A&a;&a;B&da;"

A #x20 B #x20 #x20 A #x20 #x20 B #x20 #x20

a= "&#xd;&#xd;A&#xa;&#xa;B&#xd;&#xa;"

#xD #xD A #xA #xA B #xD #xA

#xD #xD A #xA #xA B #xD #xD

Page 49: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 49

DTD-Entities

Entities used to define shortcuts to common text, like macro

s in programming languages. Entity references are references to entities.

If name is an entity [name], then &name; (or %name; but not both) is its reference

Entities can be declared internal ( contents in the same doc as its declaration) or external (contents external to its declaration)

Page 50: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 50

Internal Entity Declaration

Syntax: <!ENTITY entity-name "entity-value"> DTD Example:<!ENTITY p1 “Peter"> <!ENTITY birthday “2/12/2000">XML example:<baby>&p1; &birthday;</baby>Equ. To.<baby> Peter 2/12/2000 </baby>

Page 51: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 51

External Entity Declaration

Syntax: <!ENTITY entity-name SYSTEM "URI/URL"> DTD Example:<!ENTITY writer SYSTEM "http://www.xml.com/enti

ties/writer.xml"> <!ENTITY copyright SYSTEM "http://www.xml.com/

entities/copyright.xml">XML example:<author>&writer;&copyright;</author>

Page 52: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 52

Structure of XML Documents

Logical Structure Elements Character data

Physical Structure Entities

Document

UnitSub-unit

Document entity External parsed entity

External unparsed entity

Page 53: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 53

4. Physical Structures

An XML document may consist of one or many storage units called entities; have content identified by name.

Each XML document has one entity called the document entity, the starting entity for the XML processor and may contain

the whole document. the only kind of entities without name.

Entities may be either parsed or unparsed. this text is considered an integral part of the document.

Page 54: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 54

Classification of entities

parsed v.s. unparsd entities

general v.s. parameter entities

external v.s. internal entities

Page 55: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 55

Parsed entity and unparsed entity

An unparsed entity is a resource whose contents are not to be processed by XML processor. has an associated notation, identified by name. must be an external entity (with publicId or SystemId) referenced by [entity] name (instead of entity reference) o

ccurring only in the value of ENTITY or ENTITIES attributes.

Parsed entities are entities whose contents need to be processed by XML Processor. reference by using entity references. contents are referred to as its replacement text;

Page 56: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 56

Examples

external general parsed entity. <!ENTITY legal SYSTEM "http://www.example.com/legal.xml">

internal general parsed entity <!ENTITY nccu “National Chengchi University”>

internal parameter parsed entity <!ENTITY % colorValues “(male | female)”>

external general unparsed entity. <!NOTATION PDF SYSTEM “http://www.adobe.com/pdf”> <!ENTITY cover1 SYSTEM

http://www.host/book1cover.pdf NDATA PDF>

Note: Notation and unparsed entity are rarely used in practice.

Page 57: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 57

General entity and parameter entity

Parameter entities are parsed entities for use within the DTD. referenced by the form: %name;

General entities are entities for use within the document content. sometimes simply called entity when this leads to no

ambiguity. reference by the form: &name;

Comparisons: use different syntax in DTD for definition. use different forms of references recognized in different contexts.

Page 58: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 58

Examples

external general parsed entity. <!ENTITY legal SYSTEM "http://www.example.com/legal.xml">

internal general parsed entity <!ENTITY nccu “National Chengchi University”>

internal parameter parsed entity <!ENTITY % colorValues “(male | female)”>

external parameter parsed entity. <!ENTITY % html40 SYSTEM "http://www.w3c.org/html40.dtd">

Notes: all parameter entities are parsed entities Parameter entities generally contain only grammar inform

ation.

Page 59: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 59

4.1 Character and Entity References

A character reference refers to a specific character in the ISO/IEC 10646 character set.

Character Reference

[66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'

Page 60: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 60

4.1 Character and Entity References (cont’d)

Entity Reference

[67] Reference ::= EntityRef | CharRef

[68] EntityRef ::= '&' Name ';'

[69] PEReference ::= '%' Name ';’

Page 61: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 61

4.2 Entity Declarations

Entity Declaration

[70] EntityDecl ::= GEDecl | PEDecl

[71] GEDecl ::= '<!ENTITY' S Name S EntityDef S? '>'

[72] PEDecl ::= '<!ENTITY' S '%' S Name S PEDef S? '>'

[73] EntityDef ::= EntityValue[9] | ( ExternalID NDataDecl?)

[74] PEDef ::= EntityValue | ExternalID

notes:

1. General entities can only be referenced at non-DTD region

2. Parameter entities are referenced at DTD

Page 62: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 62

Literals[9] EntityValue ::= ‘”’ ([^%&”] | PEReference | Reference)* ‘”’ | “’” ([^%&'] | PEReference | Reference)* “’”

[10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"[11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'") [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"[13] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]

Page 63: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 63

4.2.1 Internal Entities

Entities defined by EntityValue is called an internal entity. the content of the entity is given in the declaration. no separate physical storage object, Some processing of entity and character references in the

literal entity value may be required to produce the correct replacement text.

An internal entity is always a parsed entity.

Example of an internal entity declaration:

<!ENTITY Pub-Status "This is a pre-release of the

specification.">

Page 64: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 64

4.2.2 External Entities

If the entity is not internal, it is an external entity. External Entity Declaration[75] ExternalID ::= 'SYSTEM' S SystemLiteral[9]

| 'PUBLIC' S PubidLiteral S SystemLiteral [76] NDataDecl ::= S 'NDATA' S Name [ VC: Notation Declared ]

If the NDataDecl is present, this is a general unparsed entity; otherwise it is a parsed entity.

[VC: Notation Declared]: The Name must match the declared name of a notation.

SystemLiteral is called the entity’ system identifier, which is a URI.

PubidLiteral is called the entity’s public identifier, which the XML processor may use to produce an alternative URI.

Page 65: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 65

Examples of external entity declaration

<!ENTITY open-hatch SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml">

<!ENTITY open-hatch PUBLIC

"-//Textuality//TEXT Standard open-hatch boilerplate//EN" "http://www.textuality.com/boilerplate/OpenHatch.xml” >

<!ENTITY hatch-pic SYSTEM "../grafix/OpenHatch.gif"

NDATA gif >

Page 66: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 66

4.3 Parsed Entities 4.3.1 The Text Declaration

External parsed entities may each begin with a text declaration.

Text Declaration

[77] TextDecl ::= '<?xml' VersionInfo? EncodingDecl S? '?>'

Notes: must appear at the beginning of an external parsed entity. The text declaration must be provided literally, not by refe

rence to a parsed entity.

Page 67: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 67

4.3.2 Well-formed Parsed Entities

The document entity is well-formed if it matches the production labeled document[1] .

An external general parsed entity is well-formed if it matches the production labeled extParsedEnt[78] .

All external parameter entities are well-formed by definition.

Well-Formed External Parsed Entity

[78] extParsedEnt ::= TextDecl? content

Page 68: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 68

4.3.2 Well-Formed Parsed Entities (cont’d)

An internal general parsed entity is well-formed if its replacement text matches the production labeled content[43].

All internal parameter entities are well-formed by definition.

A consequence of well-formedness in entities: the logical and physical structures in an XML document a

re properly nested; i.e., no start-tag, end-tag, empty-element tag, element, comme

nt, processing instruction, character reference, or entity reference can begin in one entity and end in another.

Page 69: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 69

4.3.3 Character Encoding in Entities

External parsed entities may use different encoding for their characters. All XML processors must support UTF-8 and UTF-16. must declare encoding in text declaration for encoding ot

her than UTF-8 or UTF-16. Encoding Declaration[80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'"EncName "'" ) [81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* /* Encoding name contains only Latin characters */Examples: <?xml encoding='UTF-8'?> <?xml encoding=’Big-5'?>

Page 70: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 70

4.4 XML Processor Treatment of Entities and References

The contexts in which character references, entity references, and unparsed entity names might appear:

1. Reference in Content : as a reference in content. EX: <p>He said: &WhatHeSaid; </p>

2. Reference in Attribute Value : as a reference within either the value of an attribute in a start-tag, or a default value in an attribute declaration; corresponds to the nonterminal AttValue. ex: <A HREF='&home;/start.html'> ex: <!ATTLIST A HREF CDATA ‘&home;/index.html’>

3. Occurs as Attribute Value: as a Name, not a reference, appearing as the value of an attribute declared as type ENTITY, or ENTITIES.

Page 71: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 71

4.4 Context in which entities or character reference may occur ex: <!ENTITY Apicture SYSTEM "http://www.antarctica.net/mypic.gif” NDATA GIF> <!ATTLIST World src ENTITY #REQUIRED> … <World src=’Apicture'>

4. Reference in Entity Value : as a reference in rule EntityValue. ex: <!ENTITY PLX "Perl &heart; XML!">

5. Reference in DTD : as a reference in internal or external subsets of the DTD, but outside of an EntityValue or AttValue. ex: <!ELEMENT %Para; (#PCDATA|%ParaBits;)*> %manyElements; <!ATTLIST … >

Page 72: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 72

4.4 summary on entities

internal v.s. external: internal ==> content given in the declaration external ==> content obtained outside the declaration ex1: <!ENTITY Pub-Status “this is …”> ex2: <!ENTITY % book-format SYSTEM “http://…/book.dtd” > ex3: <!ENTITY book1 SYSTEM “bybook.doc” NDATA WORD>

general v.s. parameter entities: general ==> used in document instance parameter ==> used in document declaration(DTD) ex: ex1==> general; ex2=> PE

parsed v.s. unparsed entities: parsed => XML processor will parse it ==> ex1, ex2 unparsed => XML processopr need’t parse it. ==> ex3 note: unparsed entities must be general and external.

Page 73: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 73

possible types of entities:

There are only 5 kinds of entities Since unparsed entities must be external and general.

Internal parsed general entities

Internal parsed parameter entities

external parsed general entities

external parsed parameter entities

external unparsed general entities

internal unparsed ---- entities (x) All internal entities are parsed entities.

external unparsed parameter entities (x) All parameter entities are parsed.

Page 74: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 74

4.5 Construction of Internal Entity Replacement Text

Two forms of the entity's value of an internal entity. literal entity value : the quoted string actually present in th

e entity declaration, corresponding to the non-terminal EntityValue.

replacement text : the content of the entity, after replacement of character references and parameter-entity references.

Notes: 1. General-entity references in literal entity value are not ex

panded to produce replacement text . 2. It is the replacement text of the entity that is substituted

for every occurrence of its entity reference.

Page 75: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 75

4.5 Example

<!ENTITY % pub "&#xc9;ditions Gallimard" >

<!ENTITY rights "All rights reserved" >

<!ENTITY book "La Peste: Albert Camus,

&#xA9; 1947 %pub;. &rights;" >

=> Entity book has replacement text:

“La Peste: Albert Camus,

© 1947 Éditions Gallimard. &rights;”

Note: No forward reference for PE is permitted. Hence entity ‘book’ could not be put before ‘pub’ entity.

Page 76: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 76

4.4.2 IncludedAn entity is included when its replacement text is

retrieved and processed,in place of the reference itself, as though it were part of the document at the location the reference was recognized. The replacement text may contain both character data and

(except for parameter entities) markup, which must be recognized in the usual way,

ex: <!ENTITY AC "The &W3C; Advisory Council"> <!ENTITY W3C "WWW Consortium"> ==>”&AC;” ==> “The &W3C; Advisory Council” ==> “The WWW Consortium Advisory Council”.

Page 77: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 77

4.4.5 include in literal

Same as Included except that a single or double quote character in the replacement text

is always treated as a normal data character and will not terminate the literal.

Example: this is well-formed: <!ENTITY % YN '"Yes"' > <!ENTITY WhatHeSaid "He said %YN;" >

while this is not: <!ENTITY EndAttr "27'" > <element attribute='a-&EndAttr;>

Page 78: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 78

4.4.8 included as PE

same as ‘included as literal’ butthe replacement text is enlarged by the attachment

of one leading and one following space (#x20) character.

ex: pe1 => [red | gree | blue] <!ATTLIST light color (yellow%pe1;) “yellow” >

Page 79: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 79

Parameter

entity Internal general

External Parsed general

Unparsed Character

Reference in content

Not Rec. (N.R.)

Included Included if validating Forbidden Included

Ref in Attr value

N.R. Included in literal

Forbidden Forbidden Included

Occurs as Attr value

N.R. Forbidden Forbidden Notify N.R.

Ref in Entity value

Included in Literal

Bypassed Bypassed Forbidden Included

Ref. in DTD

Included as PE

Forbidden Forbidden Forbidden Forbidden

4.4 XML Processor Treatment of Entities and References

Page 80: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 80

4.6 Predefined Entities

Entity and character references can both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (amp, lt, gt, apos, quot) is specified for this p

urpose. Numeric character references may also be used; they are expanded i

mmediately when recognized and must be treated as character data, so the numeric character references "&#60;" and "&#38;" may be use

d to escape < and & when they occur in character data.

1. <!ENTITY lt "&#38;#60;"> // < double escaping required for

2. <!ENTITY amp "&#38;#38;"> // & well-formed replacement text

3. <!ENTITY gt "&#62;"> // > double escaping harmless but

4. <!ENTITY apos "&#39;"> // ‘ not needed

5. <!ENTITY quot "&#34;"> // “

ex: The string "AT&amp;T;” ==> "AT--&#38;T;" ==> “AT&--T;”.

If Define 2. as “&#38; => AT&amp;T;” ==> “AT--&T;” ==> err.

Page 81: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 81

4.7 Notation Declarations

Notations identify by name the format of unparsed entities e.g., GIF, JPEG, DOC,BMP,…

Notation Declarations

[82] NotationDecl ::= '<!NOTATION' S Name S (ExternalID |

PublicID) S? '>'

[83] PublicID ::= 'PUBLIC' S PubidLiteral

4.8 Document Entityserves as the root of the entity tree and a starting-point for a

n XML processor. unlike other entities, the document entity has no name and

might well appear on a processor input stream without any identification at all.

Page 82: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 82

6. Grammar Notation (EBNF)

#xN[a-zA-Z], [#xN-#xN], [acg][^a-z][^abc]“string”, ‘STRING’ [vc: …. ](expression) [wfc: …. ]A?A B /* Comment */A | BA-B A+A*

Page 83: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 83

Appendix D. Expansion of Entity and Character References

<!ENTITY example "<p>An ampersand (&#38;#38;) may be escaped numerically (&#38;#38;#38;) or with a general entity (&amp;amp;).</p>" >

==> ENTITY example has value(replacement text):

<p>An ampersand (&#38;) may be escaped numerically (&#38;#38;) or with a general entity (&amp;amp;).</p>

A reference in the document to “&example;” cause the text to be reparsed: ==>

An ampersand (&) may be escaped numerically (&#38;) or with a general entity (&amp;).

Page 84: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 84

D. More complex example

1 <?xml version='1.0'?> 2 <!DOCTYPE test [ 3 <!ELEMENT test (#PCDATA) > 4 <!ENTITY % xx '&#37;zz;'> 5 <!ENTITY % zz '&#60;!ENTITY tricky "error-prone" >' > 6 %xx; 7 ]> 8 <test>This sample shows a &tricky; method.</test>line4 => xx has value “%zz;”line5 => zz has value “<!ENTITY trickey “error-prone”>”line6 => %xx; => %zz; => <!ENTITY trickey “error-prone”> d

eclaredline 8 => element test has content: “This sample shows a error-prone method.”

Page 85: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 85

3.4 Conditional Sections

Conditional sections are portions of the document type declaration external subset which are included in, or excluded from, the logical structure of the DTD based on the keyword which governs them.

Conditional Section

[61] conditionalSect ::= includeSect | ignoreSect

[62] includeSect ::= '<![' S? 'INCLUDE' S? '['

extSubsetDecl ']]>'

[63] ignoreSect ::= '<![' S? 'IGNORE' S? '[’

ignoreSectContents* ']]>'

[64] ignoreSectContents ::= Ignore ('<!['

ignoreSectContents ']]>' Ignore)*

[65] Ignore ::= Char* - (Char* ('<![' | ']]>') Char*)

Page 86: XMLDTD Transparency No. 1 XML Document Type Definitions (DTDs)

XML DTD

Transparency No. 86

3.4 Conditional Sections

Example:<!ENTITY % draft 'INCLUDE' ><!ENTITY % final 'IGNORE' >

<![%draft;[<!ELEMENT book (comments*, title, body,

supplements?)> ]]>

<![%final;[<!ELEMENT book (title, body, supplements?)> ]]>