xml validation i dtds

38
XML Validation I DTDs Robin Burke ECT 360 Winter 2004

Upload: mikko

Post on 19-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

XML Validation I DTDs. Robin Burke ECT 360 Winter 2004. Outline. History Grammars / Regular expressions DTDs elements attributes entities Declarations. Validation. Why bother?. The idea. Language consists of terminals a, b, c Set of productions beginning with non-terminals - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: XML Validation I DTDs

XML Validation IDTDs

Robin Burke

ECT 360

Winter 2004

Page 2: XML Validation I DTDs

Outline

History Grammars / Regular expressions DTDs

elements attributes entities

Declarations

Page 3: XML Validation I DTDs

Validation

Why bother?

Page 4: XML Validation I DTDs

The idea

Language consists of terminals a, b, c

Set of productions beginning with non-terminals

A, B, C rules specifying how to generate sequences of

terminals

Page 5: XML Validation I DTDs

Example

A aB A aBA B b generates strings

ababab etc.

Page 6: XML Validation I DTDs

Grammar

Can be used to efficiently parse a language basis of all modern programming language

parsing since Algol-60 Java Language Specification is completely in

EBNF grammar

Page 7: XML Validation I DTDs

Grammar

XML grammar-based syntax adheres to EBNF

SGML SGML had a more complex language definition

syntax HTML is defined the SGML way

Page 8: XML Validation I DTDs

Regular expressions

Language for expressing patterns Basic components

pattern elements optional element = ? repetition (1 or more) = + repetition (0 or more) = * choice = | grouping = ( ) sequence = ,

Page 9: XML Validation I DTDs

Examples

(a, b)* all strings "ab" "abab" etc.

(a | b | c)+, q, (b, c)* aaqb bq bqcccccccc

Page 10: XML Validation I DTDs

Note

Regular expressions are different in different applications Perl Javascript XML Schemas

DTDs only support ?+*|,()

Page 11: XML Validation I DTDs

EBNF

EBNF is more compact version of BNF it uses regular expressions to simplify grammar expression

A aB A aBA turns into

A aB(A)?

only one production per non-terminal allowed

Page 12: XML Validation I DTDs

DTDs

Use EBNF to specify structure of XML documents

Plus attributes entities

Syntax holdover from SGML Ugly

Page 13: XML Validation I DTDs

DTD Syntax

<!ELEMENT element-name content_model>

Content model contains the RHS of the production rule

Example<!ELEMENT name

(firstName, lastName)>

Page 14: XML Validation I DTDs

DTD Syntax cont'd

Not XML <! begins a declaration No "content" Empty elements not indicated with />

Page 15: XML Validation I DTDs

Simple content models

Content can be any text #PCDATA

Content can be anything at all (useful for debugging) ANY

Element has no content EMPTY

Page 16: XML Validation I DTDs

Example<grades>

<grade><student>Jane Doe</student><assigned-grade>A</assigned-grade>

</grade><grade>

<student>John Doe</student><assigned-grade>A-</assigned-grade>

</grade></grades>

Page 17: XML Validation I DTDs

Example<grades>

<grade><student>Jane Doe</student><assigned-grade>A</assigned-grade>

</grade><grade>

<student>John Doe</student><assigned-grade>A-</assigned-grade>

</grade><grade> <student>Wayne Doe</student>

<assigned-grade>I</assigned-grade><reason>Alien abduction</reason>

</grade></grades>

Page 18: XML Validation I DTDs

Mixed content Legal to have a content model with text and element data

<story category="national" byline="Karen Wheatley"><headline>President Meets with Congress</headline><![CDATA[ The President meet with Congressional leaders today in

effort to jump-start faltering budget negotiations. Sources described the mood

of the meeting as "cordial". ]]> <full_text ref="news801" /> <image src="img2071.jpg" /> <image src="img2072.jpg" /> <image src="img2073.jpg" /></story>

Page 19: XML Validation I DTDs

CDATA?

Forgot to mention last week Content that appears here will not be parsed

Can include arbitrary text including <, &, etc. Only restriction

termination sequence ]]>

Page 20: XML Validation I DTDs

Mixed content, cont'd

<!ELEMENT story (headline, #PCDATA, full-story, image*)>

Mixed content makes handling XML complex necessary for many applications

Page 21: XML Validation I DTDs

Recursion

Unlike grammars recursive formulation ≠ repetition

Difference between <!ELEMENT students (student+)> <!ELEMENT students (student, students?)>

Page 22: XML Validation I DTDs

Restriction

The grammar cannot be ambiguous A (a, b)| (a, c) this makes the parser implementation difficult

Usually easy to make non-ambiguous A a, (b | c)

Page 23: XML Validation I DTDs

Attribute lists

Declared separately from elements can be anywhere in the DTD

Specification includes name of the element name of the attribute attribute type default

Page 24: XML Validation I DTDs

Attribute types Character data

CDATA different from XML CDATA section!

Enumerated (yes|no)

ID must be unique in the document

IDREF must refer to an id in the document

NMTOKEN a restriction of CDATA to single "word"

Also IDREFS and NMTOKENS

Page 25: XML Validation I DTDs

Default declaration

#REQUIRED #IMPLIED

means optional Value

this becomes the default #FIXED

value provided

Page 26: XML Validation I DTDs

Examples

<!ATTLIST img

src CDATA #REQUIRED

alt CDATA #REQUIRED

align (left|right|center) "left"

id ID #IMPLIED

>

<!ATTLIST timestamp

time-zone NMTOKEN #IMPLIED>

Page 27: XML Validation I DTDs

Entities

Like macros content to be inserted indicated with &name;

Predefined general entities &amp; &lt; essential part of XML

User-defined general entities &disclaimer;

Page 28: XML Validation I DTDs

Entities, cont'd

Parameter entities can also be used to simplify DTD creation or to combine DTDs indicated with a %

More on this next week

Page 29: XML Validation I DTDs

Defining general entities

<!ENTITY name content> Example

<!ENTITY disclaimer

"This is a work of fiction. Any resemblance to persons living or dead is unintentional.">

Page 30: XML Validation I DTDs

Unparsed data

What about non-text data? images, audio files

In XML we define a notation

create a name and associate an application suggestion to the application

how to interpret the unparsed data not part of parsing operation

Page 31: XML Validation I DTDs

Using Notation

<!NOTATION name SYSTEM url> Example

<!NOTATION jpeg SYSTEM "IExplore.exe"> declares the jpeg notation

Example <!ENTITY "photo53" SYSTEM "photo53.jpg"

NDATA jpeg>

Page 32: XML Validation I DTDs

Notation, cont'd

Note that the content is defined in the DTD not the document binary data embedded in XML document

Not that useful in practice more likely to use URLs

Page 33: XML Validation I DTDs

Typical Example<story category="national" byline="Karen Wheatley">

...

<full_text ref="news801" />

<image src="img2071.jpg" />

<image src="img2072.jpg" />

<image src="img2073.jpg" />

</story>

Now it is up to the application to do something appropriate with the src attribute

Page 34: XML Validation I DTDs

A better solution

Use XLink We'll talk about this later

Page 35: XML Validation I DTDs

DTD limitations

Not in XML need a special parser for the DTD

No content type restrictions #PCDATA can be anything

Element names must be globally unique cannot reuse a common term at different places in the

document course-name professor-name

Page 36: XML Validation I DTDs

DTD benefits

Relatively easy to write and understand wait until you see XML Schema!

Possible to modularize and combine DTDs more next week

Page 37: XML Validation I DTDs

Next week

More DTDs Modularization and parameterization on-line reading

Beginning Schemas 4.1-4.30

Page 38: XML Validation I DTDs

Lab