hands-on xml 1® gvds palstar 2001 begrijpen van xml gert van der steen palstar bv university of...

54
Hands-on XML 1 ® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Upload: micah-ryno

Post on 11-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 1 ® GvdS Palstar 2001

Begrijpen van XML

Gert van der Steen

Palstar bvUniversity of Utrecht

Page 2: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 2 ® GvdS Palstar 2001

Understanding XML

• Elements

• Entities

• Attributes

• Miscelaneous

Page 3: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 3 ® GvdS Palstar 2001

Elements

<!ELEMENT book ( title, ( chapter, notes )+ ) >

Keyword Element name

Content model

Model group

Syntax:

XML name:

• any length

• case sensitive

• contains letters, digits and punctuation ‘.’ ‘-’ ‘_’ ‘:’

• starts with a letter

Page 4: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 4 ® GvdS Palstar 2001

Element declarations: operators

If A and B are model groups:

A, B A followed by B

A | B either A or B

A? optional A: zero or one

A+ one or more A

A* zero or more A

( A ) grouping of A

Page 5: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 5 ® GvdS Palstar 2001

(SCOC) 'SEQUENCE CONNECTOR' AND 'OR CONNECTOR'

Why: Alternative Sequences

ex. 1a. DTD: <!element a ( #pcdata ) ><!element b ( #pcdata ) ><!element c ( #pcdata ) >

c. DOC:<a>Text for a.</a><b>..b..</b><c>..c1..</c>

Page 6: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 6 ® GvdS Palstar 2001

(SCOC) 'SEQUENCE CONNECTOR' AND 'OR CONNECTOR'

Why: Alternative Sequences

ex. 1a DTD: <!element scoc1 ( a, ( b | c ) ) >

b Syntax diagram: scoc1 = a bc

c. DOC: <scoc1><a>..a1..</a><c>..c1..</c></scoc1>

Page 7: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 7 ® GvdS Palstar 2001

(SCOC) 'SEQUENCE CONNECTOR' AND 'OR CONNECTOR'

Why: Alternative Sequences

ex. 1a DTD: <!element scoc1 ( a, ( b | c ) ) >

b Syntax diagram: scoc1 = abc

c. DOC: <scoc1><a>..a1..</a><c>..c1..</c></scoc1>

Page 8: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 8 ® GvdS Palstar 2001

(SCOC) 'SEQUENCE CONNECTOR' AND 'OR CONNECTOR' (continued)

c. DOC: <scoc2><a>..a1..</a><c>..c1..</c></scoc2>

b Syntax diagram: scoc2 =

ex. 2a DTD: <!element scoc2 ( (a | b), c) >

cab

Page 9: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 9 ® GvdS Palstar 2001

(SCOC) 'SEQUENCE CONNECTOR' AND 'OR CONNECTOR' (continued)

c. DOC: <scoc3><b>..b1..</b><b>..b2..</b></scoc3>

b Syntax diagram: scoc3 =

ex. 3a DTD: <!element scoc3 ( (a | b), ( b | c ) ) >

ab

bc

Page 10: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 10 ® GvdS Palstar 2001

(OI) 'OCCURRENCE INDICATOR'

Why: To Repeat Elements

ex. 1 a DTD: <!element oi1 (a+, b*, c?, d) >

b1 Syntax diagram: oi1 = d

c1a DOC: <oi1><a>..a1..</a><a>..a2..</a>

<d>....d1....</d></oi1>

c1b DOC: <oi1><a>..a1..</a><b>..b1..</b><b>..b2..</b>

<c>..<c1>..</c><d>..d1..</d></oi1>

c2 DOC: Correct, minimally, the following input: <b>.. b1..</b><c>..c1..</c><c>..c2..></c>

<d>..d1..</d>

c3 Construct some input and parse

a b c

Page 11: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 11 ® GvdS Palstar 2001

(OI) 'OCCURRENCE INDICATOR' (CONTINUED)

ex. 2 a1 DTD:<!element oi2 ( a | b )+ >

a2 DTD:<!element oi3 ( a+ | b+ ) >

b2 Construct Syntax Diagrams for oi2 and oi3

Will oi2 and oi3 accept the same input?

Page 12: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 12 ® GvdS Palstar 2001

(OI) 'OCCURRENCE INDICATOR' (CONTINUED)

a DTD: <!element oi4 ( ( a?, b+)*, (c, d? )+ )

b Syntax Diagram oi4=

b3 Which elements are permissable before and after b? And for c? (Use next sheet and check with editor)

c1 DOC: <oi4><c>..c1..</c></oi4> DOC: <oi4><b>..b1..</b><b>..b2..</b><c>..c1..</c> <d>..d1..</d><c>..c2..</c></oi4> DOC: <oi4><a>..a1..</a><b>..b1..</b> <c>..c1..</c></oi4>

c2 DOC: Correct, minimally, the following input: <a>..a1..</a><c>..c1..</c> and parse

c3 Construct some more input and parse

>

a b c d

ex. 3

Page 13: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 13 ® GvdS Palstar 2001

Exercise sheet

<oi4> a b c d </oi4>

voor b

na b

voor c

na c

Page 14: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 14 ® GvdS Palstar 2001

(OI) 'OCCURRENCE INDICATOR' (CONTINUED)

ex. 4 a1 DTD: <!element x1 ( a, ( b, a )+ , b ) >

a2 DTD: <!element x2 ( ( a, b )+ ) >

b2 Construct Syntax Diagrams for x1 and x2

Will x1 and x2 accept the same input?

Page 15: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 15 ® GvdS Palstar 2001

(OIOC) 'OCCURENCE INDICATOR' AND 'OR CONNECTOR'

ex. 1 a DTD: <!element oioc1 ( ( a* | ( b?, c+) )* ) >

b1 Syntax Diagram oioc1 =

a

bc

b3 Which elements are permissable before and after a, b, and c? (Use next sheet and check with editor)

c3 Construct some input and parse

Page 16: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 16 ® GvdS Palstar 2001

Exercise sheet

onmiddelijk <oioc1> a b c </oioc1>

voor a

na a

voor b

na b

voor c

na c

Page 17: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 17 ® GvdS Palstar 2001

ex.2 a DTD: <!element oioc2 ( ( (a* | b? )* | c+ ) | ( d?, e ) ) >

b2 Construct the Syntax Diagram

c3 Construct some input and parse

(OIOC) 'OCCURENCE INDICATOR' AND 'OR CONNECTOR' (CONT)

Page 18: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 18 ® GvdS Palstar 2001

(NE) 'NESTING ELEMENTS'

ex. a DTD:<!element ne1 ( a, l1 ) >

DTD:<!element l1 ( b ) >

b1 Syntax Diagram ne1 =

l1 =

c1 DOC:<ne1><a>..a1..</a><l1><b>..b1..</b></l1></ne1>

a l1

b

Page 19: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 19 ® GvdS Palstar 2001

(RE) 'RECURSIVE ELEMENTS'

ex. 1 a DTD: !element re1 ( (a, re1, b) | c) >

b Syntax Diagram re1 =a - re1 - b

c

c1 DOC: <re1><a>..a1..</a> <re1><a>..a2..</a> <re1><c>..c1..</c> </re1> <b>..b2..</b> </re1> <b>..b1..</b> </re1>

c2 DOC: <re1><c>..c1..</c></re1>

Page 20: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 20 ® GvdS Palstar 2001

(RE) 'RECURSIVE ELEMENTS' (CONTINUED)

ex.2 a DTD: <!element l (lh, li+) > <!element lh (#pcdata) > <!element li (l | #pcdata) >

c1 DOC: <l><lh>Europe</lh> <li><l><lh>Netherlands</lh> <li>Amsterdam</li> <li>Zeist</li> </l></li> <li><l><lh>England</lh> <li>Swindon</li> </l></li> </l>

c3 Add streets by another nesting of l and try out

Page 21: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 21 ® GvdS Palstar 2001

’DETERMINISTIC'

Why: A content model must be deterministic(SGML: “not ambiguous”)

ex. 1 a DTD: <!element am1 ( ( a, b) | ( a, c ) ) >DTD: <!element am2 ( a, ( b | c ) ) >

b1 Syntax Diagrams am1 =

Syntax diagrams am2 =

In XML, am2 is deterministic, am1 not

a - b

a - c

b

ca

Page 22: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 22 ® GvdS Palstar 2001

’DETERMINISTIC' (CONTINUED)

ex. 2 a DTD: <!element am3 (a?, b*)*, a*, c) >

b1 Syntax Diagram am3 =

cb

b3 Is content model deterministic?

a1

a2

Page 23: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 23 ® GvdS Palstar 2001

COMMENT DECLARATION

Between other declarations in the DTD

<!-- text of the comment -->

Between other text in the DOC

<!-- text of the comment -->

Can be any lenght

.

.

.

Page 24: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 24 ® GvdS Palstar 2001

Why: to process text for entity references and to be sensitive

to the appearance of tags

ex. a DTD: <!element pcd (#pcdata) >

c1 DOC: <pcd> the character entity &Auml; is

processed within pcdata, also

"<!-- comment -->" will be treated </pcd>

#PCDATA (PARSED CHARACTER DATA)

Page 25: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 25 ® GvdS Palstar 2001

CDATA (CHARACTER DATA)

Why: to keep text literally

ex. DTD: no special declaration required

DOC: <p>The character entity <![CDATA [&Auml;

is not processed within CDATA, also

"<!-- comment -->"]]> will not be

treated.</p>

Page 26: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 26 ® GvdS Palstar 2001

Mixed content model

Why: Allows for “floating” or “in-line” elements in between text

Syntax in DTD: restricted to #PCDATA alternating with subelements

• DTD:– <!ELEMENT par ( #PCDATA | warning )* >

• DOC:

– <par>Clean with an alcoholic <warning>flammable</warning> substance</par>

Page 27: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 27 ® GvdS Palstar 2001

Preserved space

• Within #PCDATA white space will be normalized to a single character• With the reserved attribute ‘xml:space’ the white space will be preserved• DOC:

<par xml:space=“preserve”>

O

--I--

I

/ \

</par>

Page 28: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 28 ® GvdS Palstar 2001

Language attribute

• With the reserved attribute ‘xml:lang’ the natural language of the contained #PCDATA is encoded

• The values of the attribute are language identifiers as defined by [RFC1766], “Tags for the Identification of Languages”

• DOC:

<par xml:lang=“en-GB”>What colour is it?</par>

<par xml:lang=“en-US”>What color is it?</par>

Page 29: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 29 ® GvdS Palstar 2001

EMPTY

Why: 1. to refer to objects which are internal or external to the document

2. To trigger special processing

DTD: <!element pagebreak empty >

DOC: ...The page will break here.<pagebreak/>..

or: ...The page will break here.<pagebreak></pagebreak>..

Page 30: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 30 ® GvdS Palstar 2001

Entities

for DTD: <!ENTITY % nwc “note | warning | caution” >

for DOC: <!ENTITY XML “Extensible Markup Language” >

Keyword Entity name

ReplacementSyntax:

Page 31: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 31 ® GvdS Palstar 2001

Entities

• Entity references are requests for data to be imbedded at the

point of reference

• In a Document:– Internal text entities: simple text replacement

– External text entities: inclusion of an external document

– Binary entities: reference to multimedia files

– Character defining entities: for characters outside the default characterset

– Built-in entities: for characters used in markup

– Character entities: the number of a character in the default characterset

• In a DTD:

– Parameter entities: simple text replacement

Page 32: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 32 ® GvdS Palstar 2001

Internal text entities

• Purpose: simple text replacement; text stored in entity

• DTD: – <!ENTITY gca "Graphics Communications Association" >

• DOC:– ... the &gca; sponsor meetings ...

– ==> ... the Graphics Communications Association sponsor meetings ...

Page 33: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 33 ® GvdS Palstar 2001

External text entities

• Purpose: inclusion of an external document; reference stored in entity

• DTD:– <!ENTITY ch1 SYSTEM "http://www.../ch1.xml">

• DOC: – <book>a book about xml &ch1; ... more content ... </book>

Page 34: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 34 ® GvdS Palstar 2001

Binary entities

• Purpose: reference to multimedia files (“Non-XML data”)

• Syntax in DTD:– <!NOTATION Name PUBLIC Datatype>– <!ENTITY Name SYSTEM URL NDATA Datatype>

• DTD– <!NOTATION EPS PUBLIC "+//ISBN 0-7923-1::Graphic Notation//NOTATION

Adobe Systems Encapsulated Postscript//EN">– <!ENTITY figure1 SYSTEM "c:\graphics\figure1.pic" NDATA EPS>– <!ELEMENT graphics EMPTY> – <!ATTLIST graphics filename ENTITY #IMPLIED>– <!ELEMENT p (#PCDATA | graphics)+ >

• DOC– <p>As is shown in the following diagram: <graphics filename=”figure1"/></p>– wrong: <p> As is shown in the following diagram: &figure1;</p>

Page 35: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 35 ® GvdS Palstar 2001

Character defining entities

• Purpose: for characters outside the default characterset

• DTD:– <!ENTITY % ISOnum PUBLIC "ISO 8879-1986//ENTITIES Numeric and

Special Graphic//EN" SYSTEM “/ents/isonum.ent”>– %ISOnum;

• file isonum.ent:– <!ENTITY frac34 "[frac34]” -- fraction seven-eighths -->– <!ENTITY plusmn "[plusmn]” -- plus-or-minus sign -->– ...

• DOC:– <p>..about &frac34; of the height..</p>– => <p>..about ¾ of the height..</p>

Page 36: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 36 ® GvdS Palstar 2001

Character entities

• Purpose: hard coding of characters, e.g. for UNICODE “&#xA9;” <=> ©

• DTD: – <!ENTITY Copyright "&#xA9;” >

• DOC: &Copyright;– Resolution by parser of "&Copyright ”: "&#xA9;”– Resolution by printer/browser of "&#xA9”: " ©”

• Ranges:– &0; .. &255; -- extended ASCII set: ISO 8859/1, used under Windows, Sun

Unix and as the Web default– &256; .. &65535; -- Unicode/ISO10646– larger -- any Unicode character– alternative to decimal: hexadecimal, like &#xA9; or &#xFFF8;

Page 37: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 37 ® GvdS Palstar 2001

XML built-in character entities

• Purpose: for characters used in markup

• DTD: no declaration required

• DOC:

– &lt; for ‘<‘

– &gt; for ‘>’

– &amp; for ‘&’

– &apos; for “’’

– &quot; for ‘”’

Page 38: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 38 ® GvdS Palstar 2001

Parameter entities

• Purpose: simple text replacement in a DTD

– DTD: <!ENTITY % subelems "(para | list | table | note)" >

– DTD: <!ELEMENT body (things, %subelems;) >

– ==> DTD: <!ELEMENT body (things, (para | list | table | note)) >

• Purpose: to keep text literally

– DOC: <!ENTITY % subelems "(para | list | table | note)" >

Page 39: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 39 ® GvdS Palstar 2001

ATTRIBUTE DECLARATION

Why: to associate information with an Element: metadata, hypertext, multimedia, layout (!?), ...

Syntax:

DTD: <!ELEMENT el_name (............) >

<!ATTLIST el_name att_name1 type1 default1 att_name2 type2 default2 >

Spelling of attribute name: as an XML name (~ element name)

Allowed: more than one <!ATTLIST for an element

Element name

Attributename

Type ofattribute

Defaultvalue

Page 40: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 40 ® GvdS Palstar 2001

TYPES FOR ATTRIBUTE DECLARED VALUES

Type: Attribute value is:

CDATA SGML character data

ENTIT(Y)(IES) ( list of) subdocument(s) entity name(s)

ID Unique identifier for element

IDREF(S) (list of) (a) reference(s) to a previously ID

NMTOKEN(S) (list of) name token(s)

NOTATION member of a list ot notations

Name group one of a finite set

Page 41: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 41 ® GvdS Palstar 2001

ATTRIBUTE DECLARATION, declared values 1/3

<!ELEMENT memo ( idinfo, body ) ><!ATTLIST memo

rev CDATA #REQUIREDsize NMTOKEN #REQUIREDprojects NMTOKENS #REQUIRED

[any character]+

[ letter | 0..9 | - | . | _ | :]+

NMTOKEN, [" ", NMTOKEN]*

DTD:

DOC:

<memo rev="27/1/96 - 3.2a"

size=”.17-.19" projects="2-a 3-b" >..... </memo>

attributename

type defaultvalue

Page 42: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 42 ® GvdS Palstar 2001

ATTRIBUTE DECLARATION, declared values 2/3

<!NOTATION tex PUBLIC "-//local//NOTATION TeX Formula//EN” “c:\programs\show_tex” >

<!ENTITY pic1 SYSTEM "c:\proj3\file12" NDATA tex ><!ENTITY pic2 SYSTEM "c:\proj4\file15" NDATA tex ><!ELEMENT fig empty > <!ELEMENT figr empty > <!ELEMENT figrs empty >

<!ATTLIST fig

<!ATTLIST figr<!ATTLIST figrs

DTD:

DOC:<fig id="oor" file="pic1" > <fig id="neus" file="pic2” ><figr refid="neus"> <figrs refids="neus oor">

fileIDENTITYIDREF

#REQUIRED#REQUIRED#REQUIRED >refid

id

IDREFS #REQUIRED >refids

Page 43: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 43 ® GvdS Palstar 2001

ATTRIBUTE DECLARATION, declared values 3/3

<!NOTATION eqn SYSTEM "c:\eqn.exe”><!NOTATION tex SYSTEM "c:\tex.exe” ><!ELEMENT memo ( idinfo, body ) ><!ELEMENT formula CDATA ><!ATTLIST memo security ( ts | sec | unc ) #REQUIRED ><!ATTLIST formula data NOTATION #REQUIRED >

DTD:

DOC:

<memo security="sec">...</memo>

<formula data="eqn"> 3 over 4 </formula>

Page 44: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 44 ® GvdS Palstar 2001

DEFAULT VALUES FOR ATTRIBUTE DECLARATIONS

Reserved Words:

FIXED - used for attributes with constant values

REQUIRED - demands a user-entered value (always the case when there is no DTD

IMPLIED - value supplied by application if not entered explicitly

Example default value in DOC:

<!ATTLIST memo security ( ts | sec | unc ) “unc” >

Page 45: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 45 ® GvdS Palstar 2001

ATTRIBUTE EXERCISES

Experiment in ex.inp with attributes accordingto the attribute declarations in ex.dtd of:

- memo- memo1- fig- figr- figrs

Page 46: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 46 ® GvdS Palstar 2001

CONDITIONAL SECTION in DTD

Why: to indicate which parts of a DTD should be selected

Example in DTD:

<!ENTITY % standard ”INCLUDE”>

<!ENTITY % variant ”IGNORE” >

......

<![ %standard; [<!ENTITY % Text “#PCDATA | emph1”> ]]>

<![ %variant; [<!ENTITY % Text “#PCDATA | emph2”> ]]>

......

Page 47: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 47 ® GvdS Palstar 2001

Processing instructions (“PI”)

Why: to contain information that is not part of the document, e.g. to

trigger processor functions

Can be (mis)used for many purposes.

DTD: not required

DOC:

<p> Here follows a pagebreak <?newpage?></p>

Page 48: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 48 ® GvdS Palstar 2001

Parsing: Well-formed versus Valid

• Well-formed

– XML declaration required

– Tags must be balanced or be an EMPTY tag

– All attribute values must be quoted

– No markup characters (< or &) in the character data allowed

– Properly nested elements

– Attributes must be of type CDATA (if no dtd is used)

• Valid

– Well-formed plus conforms to DTD

Page 49: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 49 ® GvdS Palstar 2001

Parsing sequence

External subset

Parsing pathXML document

XML Declaration

Document Type Declaration

Internal subset

Prolog

Text+

Markup

Page 50: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 50 ® GvdS Palstar 2001

Concise XML Syntax

<?XML version=“1.0” encoding=“UTF-8” standalone=“no” ?><!DOCTYPE example SYSTEM “Example.dtd”[<!ENTITY XML “eXtensble Markup Language”><!ENTITY history SYSTEM “History.XML”><!ENTITY wheelchair SYSTEM “c:/Wheelchair.tif” ><!ENTITY % figs “INCLUDE”>]><example><par>The &XML; format is a very important moveto bringing the benefits of structured markupto the masses.</par>&history;<par>The following figure shows a wheelchair:</par><fig filename=“wheelchair” /><par>The tags <![CDATA[<example>, <par> and <fig../> are used in this document]].</par></example>

Example.xml<par>Superficially it looks like HTML because the tags have the same delimiters, &#60; and &#62; </par><par xml:space=‘preserve’ xml:lang=“en.gb”> --- XML --- | |SGML HTML</par>

History.xml

<!-- The example DTD --><!NOTATION TIFF SYSTEM “Showtiff.exe” ><!ENTITY % figs “IGNORE” ><![%figs[<!ENTITY %ExampleContent “par | fig”>]]><!ENTITY % ExampleContent “par”><!element example (%exampleContent;)+><!element par (#PCDATA)><!element fig EMPTY><!attlist fig filename ENTITY #REQUIRED>

Example.dtd

XML DeclarationDocument Type Declaration

Internal subset

c:/Wheelchair.tifExternal subset

With thanks to Neil Bradley: “The XML Companion”, Addison Wesley Longman, 2nd ed., ISBN 0-201-342855

Prolog

Page 51: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 51 ® GvdS Palstar 2001

HOW TO WRITE DOCUMENT TYPE DEFINITIONS

• Left brain:

- Makes subdivisions

- Results in numbering and hierarchy

• Right brain:

- Makes associations

- Results in relations

• In XML:

- Hierarchical structure by rewriting elements in components

- Necessary: document analysis

- Associative structure by writing attributes to elements

- Necessary: inventory of useful relations

Page 52: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 52 ® GvdS Palstar 2001

DESIGNER OF STRUCTURED DOCUMENTS

authors publishers

document analyst / designer

hypertext designer

database designer

information types designer

Page 53: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 53 ® GvdS Palstar 2001

Oefeningen in het aanpassen van documenten en DTD’s

• Breid B.xml uit, o.a. met een lijst

• Breng modificaties aan in EX.DTD en Ex.xml:

– creëer floating elementen “fn” en “fnr” met bij elkaar horende id’s en gebruik

deze binnen een ander element, bijv. “Z” dat #PCDATA bevat

– kort een bepaalde constructie in EX.DTD af door een parameter entity

– vervang een stuk tekst in de invoer door een general entity, te definiëren in

de DTD

• Maak/genereer een documentschema van B.DTD, analoog aan

het schema voor het Memo en het Workshop Manual

Page 54: Hands-on XML 1® GvdS Palstar 2001 Begrijpen van XML Gert van der Steen Palstar bv University of Utrecht

Hands-on XML 54 ® GvdS Palstar 2001

Oefeningen in het zelf maken van kleine DTD’s

Beschrijf de regelmaat in de opgegeven patronen in een content model; test m.b.v. een uitbreiding van EX.DTD en Ex.xml.

Teneinde het document kort te houden gebruiken we de volgende element declaraties:

<!ELEMENT p empty>

<!ELEMENT q empty> etcetera.

Het document kan dan bevatten: <p/>, <q/> etcetera.

A. drie documenten: ppqrss pqqs pppqrsss

content model: gebruik "(", ")", ",", "+" en "?"

B. vier documenten: p pq pqr pqrs

content model: gebruik "(", ")", "," en "?"

C. vijf documenten: pqp pqr qrp qqp prr

content model: gebruik "(", ")", "," en "|"