more xml chpt 6 dtd document type definition. dtd: document type definition a dtd is defined using...

59
More xml chpt 6 DTD Document Type Definition

Post on 21-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

More xml chpt 6 DTD

Document Type Definition

Page 2: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

DTD: document type definition

• A DTD is defined using EBNF (extended BNF) and can be used to specify allowable elements and attributes for an XML document.

• There is a move away from DTD currently, toward Schema. Schema documents have XML (not BNF) syntax.

• Some parsers can check an XML document against its DTD and determine if it is valid. These are called validating parsers. A document which is syntactically correct but does not correspond to its DTD is well-formed. Non-validating parsers can’t check documents against their DTD and can thus only determine if the document is well-formed.

Page 3: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Document Type Declaration

<DOCTYPE….> in an XML document prolog is used to specify DTD appearing within or outside the document. These are referred to as the internal or external subset.

<DOCTYPE thingy [ <!ELEMENT thingy (#PCDATA)>]>Declares a dtd called thingy with one element in the internal

subset. PCDATA refers to “parseable character data” meaning

reserved characters <,> and & within the PCDATA will be treated as markup. The parentheses contain the content specification for the element.

Page 4: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

MS XML validator

• We can check an xml document for adherence to an external DTD using MS XML validator. Here’s the xml:

<?xml version = "1.0"?><!-- Fig. 6.1: intro.xml --><!-- Using an external subset --><!DOCTYPE myMessage SYSTEM "intro.dtd"><myMessage> <message>Welcome to XML!</message></myMessage>And here’s the DTD:<!-- Fig. 6.2: intro.dtd --><!-- External declarations --><!ELEMENT myMessage ( message )><!ELEMENT message ( #PCDATA )>

Page 5: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

MS Validating parser can validate against schema or dtd

Page 6: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Invalid xml

• In the next slide we use the MS XML validator to check an xml (appearing below) like intro.xml but missing the message element:<?xml version = "1.0"?><!-- Fig. 6.3 : intro-invalid.xml --><!-- Simple introduction to XML markup --><!DOCTYPE myMessage SYSTEM "intro.dtd"><!-- Root element missing child element message --><myMessage></myMessage>

Page 7: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

If xml doc does not match dtd/schema

Page 8: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Sequences, pipes and occurrences

• The comma can be used to indicate a sequence in which elements must appear.

<!ELEMENT class (prof, student)>• Indicates the order and number of elements making up a class: one

prof and one student, in that order. Content may specify any number of elements.

<!ELEMENT sidedish (coleslaw|chips)> • Indicates just one of the choices must be selected.• +, *, and ? Indicate frequency of element occurrences.• + means 1 or more occurences, * means 0 or more occurences, ?

Means 0 or 1 occurrence.<!ELEMENT class (prof, student+)> Might be appropriate for a class DTD meaning just one professor and one

or more students.

Page 9: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

example

<!ELEMENT donuts (jelly?,lemon*,((crème|sugar)+|glazed))>

Specifies donuts consists of 0 or 1 jelly, 0 or more lemon, 1 or more of crème or sugar, or a glazed. A legal markup for this would be

<donuts>

<jelly>grape</jelly>

<lemon>sour</lemon>

<lemon>real sour</lemon>

<glazed>chocolate</glazed>

</donuts>

Page 10: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

The dtd and xmlPastry.dtd:<!ELEMENT jelly (#PCDATA)><!ELEMENT glazed (#PCDATA)><!ELEMENT lemon (#PCDATA)><!ELEMENT creme (#PCDATA)><!ELEMENT sugar (#PCDATA)><!ELEMENT donuts (jelly?,lemon*,((creme|sugar)+|glazed))>Pastry.xml<?xml version = "1.0"?><!-- pastry.xml --><!-- Using an external subset --><!DOCTYPE donuts SYSTEM "pastry.dtd"><donuts><jelly>grape</jelly><lemon>sour</lemon><lemon>real sour</lemon><glazed>chocolate</glazed></donuts>

Page 11: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

In validator: files are in myexamples directory

Page 12: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Pastry.xml in xml validator

Page 13: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

content specification

• An element may contain one or more child elements as content. • Content specification types describe non-element content.• Theses consist of ANY, EMPTY and mixed content.• Empty elements do not contain character data or child elements. An

empty element specification like<!ELEMENT nest EMPTY> could be marked up as<nest/>. Recall the shorthand /> may be used for an empty element

closetag.• , + and * can’t be used with mixed content elements containing only

PCDATA. If mixed content may contain PCDATA, then this must be listed first.

• An element of type ANY may contain any content including PCDATA, or combinations of elements and PCDATA. They may also be empty.

Page 14: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Mixed content

• <ELEMENT mymessage (#PCDATA|message)*>

• Declares mymessage to have mixed content. PCDATA must be listed first in mixed content. * means mymessage may contain nothing or any number of occurences of PCDATA and message elements. This would be legal markup:<mymessage>here is an example of the dtd above

<message>this is a message</message>

<message>and another</message>

</mymessage>

Page 15: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Internal dtd• An xml document is standalone if it does not reference an external subset.<?xml version = "1.0" standalone = "yes"?>

<!-- Fig. 6.5: mixed.xml --><!-- Mixed content type elements -->

<!DOCTYPE format [ <!ELEMENT format ( #PCDATA | bold | italic )*> <!ELEMENT bold ( #PCDATA )> <!ELEMENT italic ( #PCDATA )>]>

<format> This is a simple formatted sentence. <bold>I have tried bold.</bold> <italic>I have tried italic.</italic> Now what?</format>

Page 16: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

In ms xml validator

Page 17: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Element group

<!ELEMENT courselist (department, (coursenumber, coursedescription)+)>

• Above, a courselist contains a single department followed by any number of coursenumber, coursedescription pairs.

• What does the following mean?

<!ELEMENT course (coursenumber, (sectionnumber, instructor, roomnumber)+)>

Page 18: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Attribute specification

• An attribute specification specifies an attribute list for an element via ATTLIST declaration:<!ELEMENT x EMPTY>

<ATTLIST x y CDATA #REQUIRED>

• Here, y is a required attribute of element x. y may contain any char data (except <,>, ‘, “ and &).

• CDATA in an attribute declaration has different meaning than a CDATA section in an XML document where ]]> (end tag) may not appear.

Page 19: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Using attributes<?xml version = "1.0"?>

<!-- Fig. 6.7: intro2.xml --><!-- Declaring attributes -->

<!DOCTYPE myMessage [ <!ELEMENT myMessage ( message )> <!ELEMENT message ( #PCDATA )> <!ATTLIST message id CDATA #REQUIRED>]>

<myMessage>

<message id = "445"> Welcome to XML! </message>

</myMessage>

Page 20: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Document with attributes in MS validator

Page 21: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Attribute defaults

• Page authors can specify default values for attributes. • The keywords are #IMPLIED, #REQUIRED and #FIXED.

– An implied attribute, if missing, can be replaced by any value the application using the document wishes.

– A required attribute must appear or the document is not valid.

– A fixed attribute must have the specific value provided.• <message>number</message> does not conform to <!ATTLIST

message number CDATA #REQUIRED>• <!ATTLIST address zip #FIXED “13820”> specifies that zip can only

have value “13820” and an application processing an XML document with address element missing attribute zip would be passed this default zip value.

Page 22: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Attributes

• Attribute types may be CDATA (Strings), tokenized or enumerated.

• Strings have no constraints beyond prohibiting <,>,&,’,and “. Entity references must be used for these.

• Tokenization imposes constraints on attribute values such as which characters are permitted in an attribute name.

• An enumerated attribute has a restricted value range: It can only take on one of the values listed in the attribute declaration.

Page 23: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

tokenized attribute

• 4 tokenized types exist:– ID– IDREF– ENTITY– NMTOKEN

• ID uniquely identifies an element.• IDREF attributes point to elements with ID attribute.• A validating parser verifies that each ID attribute type

referenced by an IDREF is in the document.• Using the same value for multiple ID attributes is an error.• Declaring attributes of type ID to be #FIXED is an error.

Page 24: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Using ID and IDREF attributes<?xml version = "1.0"?><!-- IDExample.xml Example for ID and IDREF values of attributes --><!DOCTYPE bookstore [ <!ELEMENT bookstore ( shipping+, book+ )> <!ELEMENT shipping ( duration )> <!ATTLIST shipping shipID ID #REQUIRED> <!ELEMENT book ( #PCDATA )> <!ATTLIST book shippedBy IDREF #IMPLIED> <!ELEMENT duration ( #PCDATA )>]><bookstore> <shipping shipID = "s1"> <duration>2 to 4 days</duration> </shipping> <shipping shipID = "s2"> <duration>1 day</duration> </shipping> <book shippedBy = "s2"> Java How to Program 3rd edition. </book> <book shippedBy = "s2"> C How to Program 3rd edition. </book> <book shippedBy = "s1"> C++ How to Program 3rd edition. </book></bookstore>

Page 25: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

In MS ValidatorUse URL:

http://employees.oneonta.edu/higgindm/internet%20programming/validate_js.htm

with file examples\ch06\IDExample.xml

Page 26: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

ID example

Page 27: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

id example: internal subset

<?xml version = "1.0"?>

<!-- Fig. 6.8: IDExample.xml -->

<!-- Example for ID and IDREF values of attributes -->

<!DOCTYPE bookstore [

<!ELEMENT bookstore ( shipping+, book+ )>

<!ELEMENT shipping ( duration )>

<!ATTLIST shipping shipID ID #REQUIRED>

<!ELEMENT book ( #PCDATA )>

<!ATTLIST book shippedBy IDREF #IMPLIED>

<!ELEMENT duration ( #PCDATA )>

]>

Page 28: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Idexample.xml continued

<bookstore> <shipping shipID = "s1"> <duration>2 to 4 days</duration> </shipping> <shipping shipID = "s2"> <duration>1 day</duration> </shipping> <book shippedBy = "s2"> Java How to Program 3rd edition. </book> <book shippedBy = "s2"> C How to Program 3rd edition. </book><book shippedBy = "s1"> C++ How to Program 3rd edition. </book></bookstore>

Page 29: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

remarks

• It is an error not to begin a type attribute ID’s value with a letter, underscore or colon.

• Providing more than one ID attribute type for an element is an error.

• Referencing a value as an ID is not defined is an error.

Page 30: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

IDExample2.xml (note s3 shippedBy value)<bookstore> <shipping shipID = "s1"> <duration>2 to 4 days</duration> </shipping> <shipping shipID = "s2"> <duration>1 day</duration> </shipping> <book shippedBy = "s2"> Java How to Program 3rd edition. </book> <book shippedBy = "s2"> C How to Program 3rd edition. </book> <book shippedBy = "s3"> C++ How to Program 3rd edition. </book></bookstore>

Page 31: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

IDExample2.xml in Validator

Page 32: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Entities

• As we saw in chapter 5 entity references in an xml document are replaced by the entity values found in the dtd.

• We saw this for lang.xml and lang.dtd where assoc and text entities were replaced with Arabic script.

• Here is another example. Entity city is replaced.

Page 33: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

entityexample.xml<?xml version = "1.0"?><!-- Fig. 6.10: entityExample.xml --><!-- ENTITY and ENTITY attribute types --><!DOCTYPE database [ <!NOTATION html SYSTEM "iexplorer"> <!ENTITY city SYSTEM "tour.html" NDATA html> <!ELEMENT database ( company+ )> <!ELEMENT company ( name )> <!ATTLIST company tour ENTITY #REQUIRED> <!ELEMENT name ( #PCDATA )>]><database> <company tour = "city"> <name>Deitel &amp; Associates, Inc.</name> </company></database>

Page 34: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

entityexample.xml

Page 35: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

entityexample.xml

• Here line 7 <NOTATIO… indicates that an application may wish to run IE and load tour.html to handle unparsed entities.

• line 8 declares an entity named city which refers to the external document tour.html.

• NDATA in this line indicates that the content of this entity is not xml and supplies the name of the notation (html) for this entity.

Page 36: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

ENTITIES

• ENTITIES keyword can be used in a dtd to indicate that an attribute has mutliple entities for its value.

• <!ATTLIST directory file ENTITIES #REQUIRED>• Specifies that file must contain multiple entities. Conforming markup is• <directory file “animations graphics tables”>• animations, graphics and tables are entities declared in a dtd.• NMTOKEN type is more restrictive, containing letters, digits, periods,

underscores, hyphens and colons.• <!ATTLIST mathdept phonenum NMTOKEN #REQUIRED> might have

conforming markup• <mathdept phonenum =“607-436-3708”>• <mathdept phonenum =“607 436 3708”> does not conform because spaces are

not allowed.• NMTOKENS attribute type would allow multiple string tokens separated by

blanks.

Page 37: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Enumerated attribute types

• Enumerated attribute type declares a list of possible values. Attributes must be assigned a value from this list in order to conform to the dtd. Enumerated values are separated with pipe (|)

• <!ATTLIST person gender (M|F) “F”> allows a person to have gender M or F with default “F”.

• <!ATTLIST person gender (M|F) #IMPLIED> does not supply a default and would permit an application to process a person with no gender in whatever way it liked.

Page 38: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Enumerated attribute types

• NOTATION is also an enumerated attribute type.

<!ATTLIST CSCI116 language NOTATION (Java|C) “C”>

Specifies that language must be assigned a value, Java or C with C as the default. The notation for C might be specified as

<!NOTATION C System http://....html>

Page 39: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

conditional.xml

• Conditional sections provide the flexibility of including or excluding declarations.

• These enable us to check xml documents against different sets of dtd requirements.

• Keywords INCLUDE and IGNORE specify included and excluded declarations:

<![INCLUDE[

<!ELEMENT name (#PCDATA)>

]]>

Directs the parser to include the declaration of element name.

Conditionals may also be used with entities.

Page 40: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Conditional.dtd

<!-- conditional.dtd --><!-- DTD for conditional section example --><!ENTITY % reject "IGNORE"><!ENTITY % accept "INCLUDE"><![ %accept; [ <!ELEMENT message ( approved, signature )>]]><![ %reject; [ <!ELEMENT message ( approved, reason, signature )>]]><!ELEMENT approved EMPTY><!ATTLIST approved flag ( true | false ) "false"><!ELEMENT reason ( #PCDATA )><!ELEMENT signature ( #PCDATA )>

Page 41: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Conditional.xml

<?xml version = "1.0" standalone = "no"?>

<!-- conditional.xml -->

<!-- Using conditional sections -->

<!DOCTYPE message SYSTEM "conditional.dtd">

<message>

<approved flag = "true"/>

<signature>Chairman</signature>

</message>

Page 42: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

discussion

• Entities %accept and %reject have values “IGNORE” and “INCLUDE”.

• The percent symbol indicates that they are parameter entities and may only be used inside the dtd in which they are declared. They may only appear in the external subset.

• Thus the author may create entities specific to the dtd – not xml – document.

Page 43: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

conditional.xml

Page 44: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

<?xml version = "1.0" standalone = "no"?>

<!-- Fig. 6.13: conditional.xml -->

<!-- Using conditional sections -->

<!DOCTYPE message SYSTEM "conditional.dtd">

<message>

<approved flag = "true"/>

<signature>Chairman</signature>

</message>

conditional.xml

Page 45: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

conditional.dtd

<!-- DTD for conditional section example -->

<!ENTITY % reject "IGNORE"><!ENTITY % accept "INCLUDE">

<![ %accept; [ <!ELEMENT message ( approved, signature )>]]>

<![ %reject; [ <!ELEMENT message ( approved, reason, signature )>]]>

<!ELEMENT approved EMPTY><!ATTLIST approved flag ( true | false ) "false">

<!ELEMENT reason ( #PCDATA )><!ELEMENT signature ( #PCDATA )>

Page 46: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Whitespace

• Whitespace is preserved or normalized depending on the context in which it appears.

• A text example (whitespace.xml) uses a java program (Tree.java from chapt 9) to demonstrate when whitespace is preserved or normalized.

• File can be got from classdir\examples\ch09\tree.java

Page 47: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

running Tree.java on whitespace.xml... java src in notes

C:\Java\j2sdk1.4.1_01\bin>java Tree yes whitespace.xmlURL: file:C:/Java/j2sdk1.4.1_01/bin/whitespace.xml[ document root ]+-[ element : whitespace ] +-[ ignorable ] +-[ ignorable ] +-[ ignorable ] +-[ element : hasCDATA ] +-[ attribute : cdata ] " simple cdata " +-[ ignorable ] +-[ ignorable ] +-[ ignorable ] +-[ element : hasID ] +-[ attribute : id ] "i20" +-[ ignorable ] +-[ ignorable ] +-[ ignorable ] +-[ element : hasNMTOKEN ] +-[ attribute : nmtoken ] "hello" +-[ ignorable ] +-[ ignorable ] +-[ ignorable ]

Page 48: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Java tree output continued+-[ element : hasEnumeration ] +-[ attribute : enumeration ] "true" +-[ ignorable ] +-[ ignorable ] +-[ ignorable ] +-[ element : hasMixed ] +-[ text ] "" +-[ text ] " This is text." +-[ text ] "" +-[ text ] " " +-[ element : hasCDATA ] +-[ attribute : cdata ] " simple cdata" +-[ text ] "" +-[ text ] " This is some additional text." +-[ text ] "" +-[ text ] " " +-[ ignorable ] +-[ ignorable ][ document end ]

C:\Java\j2sdk1.4.1_01\bin>

Page 49: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

whitespace.xml: dtd and content<?xml version = "1.0"?>

<!-- whitespace.xml --><!-- Demonstrating whitespace parsing -->

<!DOCTYPE whitespace [ <!ELEMENT whitespace ( hasCDATA, hasID, hasNMTOKEN, hasEnumeration, hasMixed )>

<!ELEMENT hasCDATA EMPTY> <!ATTLIST hasCDATA cdata CDATA #REQUIRED>

<!ELEMENT hasID EMPTY> <!ATTLIST hasID id ID #REQUIRED>

<!ELEMENT hasNMTOKEN EMPTY> <!ATTLIST hasNMTOKEN nmtoken NMTOKEN #REQUIRED>

<!ELEMENT hasEnumeration EMPTY> <!ATTLIST hasEnumeration enumeration ( true | false ) #REQUIRED>

<!ELEMENT hasMixed ( #PCDATA | hasCDATA )*>]>

Page 50: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

whitespace.xml continued

<whitespace>

<hasCDATA cdata = " simple cdata "/>

<hasID id = " i20"/>

<hasNMTOKEN nmtoken = " hello"/>

<hasEnumeration enumeration = " true"/>

<hasMixed> This is text. <hasCDATA cdata = " simple cdata"/> This is some additional text. </hasMixed>

</whitespace>

Page 51: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Tree.java slide 1import java.io.*;import org.xml.sax.*; // for HandlerBase classimport javax.xml.parsers.SAXParserFactory;import javax.xml.parsers.ParserConfigurationException;import javax.xml.parsers.SAXParser;public class Tree extends HandlerBase { private int indent = 0; // indentation counter // returns the spaces needed for indenting private String spacer( int count ) { String temp = ""; for ( int i = 0; i < count; i++ ) temp += " "; return temp; } // method called before parsing // it provides the document location public void setDocumentLocator( Locator loc ) { System.out.println( "URL: " + loc.getSystemId() ); }

Page 52: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Tree.java slide 2

// method called at the beginning of a document public void startDocument() throws SAXException { System.out.println( "[ document root ]" ); }

// method called at the end of the document public void endDocument() throws SAXException { System.out.println( "[ document end ]" ); }

// method called at the start tag of an element public void startElement( String name, AttributeList attributes ) throws SAXException { System.out.println( spacer( indent++ ) + "+-[ element : " + name + " ]"); if ( attributes != null ) for ( int i = 0; i < attributes.getLength(); i++ ) System.out.println( spacer( indent ) + "+-[ attribute : " + attributes.getName( i ) + " ] \"" + attributes.getValue( i ) + "\"" ); }

Page 53: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Tree.java slide 3// method called at the end tag of an element public void endElement( String name ) throws SAXException { indent--; } // method called when a processing instruction is found public void processingInstruction( String target, String value ) throws SAXException { System.out.println( spacer( indent ) + "+-[ proc-inst : " + target + " ] \"" + value + "\"" ); } // method called when characters are found public void characters( char buffer[], int offset, int length ) throws SAXException { if ( length > 0 ) { String temp = new String( buffer, offset, length ); System.out.println( spacer( indent ) + "+-[ text ] \"" + temp + "\"" ); } } // method called when ignorable whitespace is found public void ignorableWhitespace( char buffer[], int offset, int length ) { if ( length > 0 ) { System.out.println( spacer( indent ) + "+-[ ignorable ]" ); } }

Page 54: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Tree slide 4

// method called on a non-fatal (validation) error public void error( SAXParseException spe ) throws SAXParseException { // treat non-fatal errors as fatal errors throw spe; }

// method called on a parsing warning public void warning( SAXParseException spe ) throws SAXParseException { System.err.println( "Warning: " + spe.getMessage() ); }

Page 55: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Tree.java slide 5// main method public static void main( String args[] ) { boolean validate = false; if ( args.length != 2 ) { System.err.println( "Usage: java Tree [validate] " + "[filename]\n" ); System.err.println( "Options:" ); System.err.println( " validate [yes|no] : " + "DTD validation" ); System.exit( 1 ); } if ( args[ 0 ].equals( "yes" ) ) validate = true; SAXParserFactory saxFactory = SAXParserFactory.newInstance(); saxFactory.setValidating( validate ); try { SAXParser saxParser = saxFactory.newSAXParser(); saxParser.parse( new File( args[ 1 ] ), new Tree() ); } catch ( SAXParseException spe ) { System.err.println( "Parse Error: " +

spe.getMessage() ); } catch ( SAXException se ) { se.printStackTrace(); } catch ( ParserConfigurationException pce ) { pce.printStackTrace(); } catch ( IOException ioe ) { ioe.printStackTrace(); } System.exit( 0 ); }}

Page 56: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

Day planner example continued

Page 57: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

planner.xml<?xml version = "1.0"?>

<!-- : planner.xml Day Planner XML document -->

<!DOCTYPE planner SYSTEM "planner.dtd">

<planner>

<year value = "2000">

<date month = "7" day = "15">

<note time = "1430">Doctor's appointment</note>

<note time = "1620">Physics class at BH291C</note>

</date>

<date month = "7" day = "4">

<note>Independence Day</note>

</date>

<date month = "7" day = "20">

<note time = "0900">General Meeting in room 32-A</note>

</date>

<date month = "7" day = "20">

<note time = "1900">Party at Joe's</note>

</date>

<date month = "7" day = "20">

<note time = "1300">Financial Meeting in room 14-C</note>

</date>

</year>

</planner>

Page 58: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

planner.dtd

<!-- DTD for day planner -->

<!ELEMENT planner ( year* )>

<!ELEMENT year ( date+ )><!ATTLIST year value CDATA #REQUIRED>

<!ELEMENT date ( note+ )><!ATTLIST date month CDATA #REQUIRED><!ATTLIST date day CDATA #REQUIRED>

<!ELEMENT note ( #PCDATA )><!ATTLIST note time CDATA #IMPLIED>

Page 59: More xml chpt 6 DTD Document Type Definition. DTD: document type definition A DTD is defined using EBNF (extended BNF) and can be used to specify allowable

HW this section

1. Make a dtd and a conforming xml file. Make your example non-trivial but feel free to copy and modify examples given in class or your text. Check your work in the MS Validator. That means, elements should have attributes, etc.

2. You may also need to download the Xerces parser (you’ll need it at some point this semester) and install it as per the documentation that accompanies it.

3. Save tree.java to your java directory. Make sure it compiles and runs. See step 4 below.

4. For step 3, you will need to download JAXP from http://java.sun.com/xml/download.html