extensible markup language lecture notes/xml 03.pdf · 2 document type definition (dtd) •a dtd is...

26
1 XML (3) Extensible Markup Language Acknowledgements and copyrights : these slides are a result of combination of notes and slides with contributions from: Michael Kiffer, Arthur Bernstein, Philip Lewis, Hanspeter Mφssenbφck, Hanspeter Mφssenbφck, Wolfgang Beer, Dietrich Birngruber, Albrecht Wφss, Mark Sapossnek, Bill Andreopoulos, Divakaran Liginlal, Anestis Toptsis, Addison Wesley, Microsoft AA. They serve for teaching purposes only and only for the students that are registered in CSE4413 and should not be published as a book or in any form of commercial product, unless written permission is obtained from each of the above listed names and/or organizations.

Upload: others

Post on 19-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

1

XML (3)Extensible Markup Language

Acknowledgements and copyrights: these slides are a result of combination of notes and slides with contributions from: MichaelKiffer, Arthur Bernstein, Philip Lewis, Hanspeter Mφssenbφck, Hanspeter Mφssenbφck, Wolfgang Beer, Dietrich Birngruber,

Albrecht Wφss, Mark Sapossnek, Bill Andreopoulos, DivakaranLiginlal, Anestis Toptsis, Addison Wesley, Microsoft AA.

They serve for teaching purposes only and only for the students that are registered in CSE4413 and should not be published as a book or in any form of commercial product, unless written permission is

obtained from each of the above listed names and/or organizations.

Page 2: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

2

Document Type Definition (DTD)

• A DTDDTD is a grammar specification for an XML document

• DTDs are optional – don’t need to be specified

• If specified, DTD can be part of the document (at the top); or it can be given as a URL

• A document that conforms (i.e., parses) w.r.t. its DTD is said to be validvalid

Page 3: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

3

XMLData + DTD<!-- XML Data--><a>

<b> Some </b><c> 100 </c><c> 101 </c>

</a>

<!-- XML Data--><a>

<b> Some </b><b> Thing </b>

</a>

Not Valid!DTD

<!ELEMENT a (b+, c?) ><!ELEMENT b (#PCDATA) ><!ELEMENT c (#PCDATA) >

Valid

Page 4: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

4

What is a DTD ?

• Document Type Definition (DTD)• Defines the syntax, grammar and semantics • Defines the document structure

– What Elements, Attributes, Entities, etc are permitted?– How are the document elements related and structured?

• Referenced by or defined in XML documents, but it’s not XML!

• Enables validation of XML documents using an XML Parser

• Can be referenced to by more than one XML document

• DTD’s may reference other DTD’s

Page 5: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

5

Schemas: DTD Example

• XML document (that conforms to DTD below)

• DTD schema:

<!DOCTYPE BOOK [<!ELEMENT BOOK (TITLE+, AUTHOR) ><!ELEMENT TITLE (#PCDATA) ><!ELEMENT AUTHOR (#PCDATA) >]>

<BOOK><TITLE>All About XML</TITLE><AUTHOR>Joe Developer</AUTHOR>

</BOOK>

Page 6: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

6

DTD By Diagram

Customer

FName LName

Address

Address

Address

CustomerOrder

Orders

OrderNo ProductNo

ProductNo

ProductNo

OrderNo ProductNo

ProductNo

Person

Orders

Orders

Page 7: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

7

DTD By Example

http://www.myco.com/dtd/order.dtd<?xml version = “1.0” encoding = “UTF-8” ?><!DOCTYPE CustomerOrder [

<!ELEMENT CustomerOrder (Customer, Orders*) >

<!ELEMENT Customer (Person, Address+) ><!ELEMENT Person (FName, LName) ><!ELEMENT FName (#PCDATA) ><!ELEMENT LName (#PCDATA) ><!ELEMENT Address (#PCDATA) ><!ATTLIST Address

AddrType ( billing | shipping | home ) “shipping” >

<!ELEMENT Orders (OrderNo, ProductNo+) ><!ELEMENT OrderNo (#PCDATA) ><!ELEMENT ProductNo (#PCDATA) >]>

0 or more times

Exactly 1 time

1 or more timesParsed Character

Data

Or (choice)

Default value

Page 8: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

8

DTD syntax rules• , (comma) indicates strict sequence. For example,

<!ELEMENT CustomerOrder (Customer, Orders) > • | (pipe) indicates option. For example,

AddrType ( billing | shipping | home )• Cardinality: How many instances allowed

– ? Optional (may or may not appear)– * Zero or more– + One or more

<!ELEMENT Category (subcategory+ ) >• #PCDATA stands for parsed character data • Mixed Content can be indicated using #PCDATA. For example,

<!ELEMENT Idea (#PCDATA | product | service)* >Eg. <Idea>

<product> … </product>Some descriptive text included as pcdata

<service> …. </service><product> …. </product>

</Idea>

Page 9: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

9

DTDs (cont’d)

• A DTD can be specified as part of a XML document:

<?xml version=“1.0” ?><!DOCTYPE Report [

… … …]><Report> … … … </Report>

• A DTD can be specified as a standalone file, and used inside a XML document.

<?xml version=“1.0” ?><!DOCTYPE Report http://foo.org/Report.dtd”><Report> … … … </Report>

Page 10: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

10

DTD Example<!DOCTYPE Report [

<!ELEMENT Report (Students, Classes, Courses)><!ELEMENT Students (Student*)><!ELEMENT Classes (Class*)><!ELEMENT Courses (Course*)><!ELEMENT Student (Name, Status, CrsTaken*)><!ELEMENT Name (First,Last)><!ELEMENT First (#PCDATA)>… … …<!ELEMENT CrsTaken EMPTY><!ELEMENT Class (CrsCode,Semester,ClassRoster)><!ELEMENT Course (CrsName)>… … …<!ATTLIST Report Date CDATA #IMPLIED><!ATTLIST Student StudId ID #REQUIRED><!ATTLIST Course CrsCode ID #REQUIRED><!ATTLIST CrsTaken CrsCode IDREF #REQUIRED><!ATTLIST ClassRoster Members IDREFS #IMPLIED>

]>Exercise: Use the above DTD to write a conforming XML document of your

choice.

Zero or more

Has text content (PCDATAstands for Parsed Character

Data)

Empty element, no content. Neither text nor child elements.

Like <BR/> in HTML.(but it may have attributes).

Same attribute in different elements

Page 11: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

11

DTD- ELEMENT content specifications

<!ELEMENT Tagnamecontent_specification>

• Content_Specification: – EMPTY: Neither text nor child elements associated

• like <BR/> or <Link url=‘http://abc.d’/> in HTML.– ANY: Content that does not violate well-formed syntax – MIXED: Mix of elements, #PCDATA, or text

Page 12: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

12

DTD- Attribute Declarations<!ATTLIST elementname attrname Type DEFAULT>

Required always in an instance of this element

#REQUIRED

The default value of the attribute. If the attribute is absent the default value is assumed by the parser.

default value only

Attribute has a fixed value. If this attribute is absent then the default value is assumed by the parser

#FIXED default value

Optional in an instance of this element

#IMPLIED

MeaningDefault

Divakaran Liginlal

Page 13: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

13

Examples: DTD- Attribute Declaration<!ATTLIST product Name CDATA #REQUIRED>

• A product must have a name and it is a character data string

<!ATTLIST product Life #IMPLIED>• A product may have a specified Life

<!ATTLIST product Life #FIXED “Not Known”>• By default the product’s Life = ‘Not Known’ and is

constant. If attribute Life is absent, then still the attribute Life will be assumed to have this value.

<!ATTLIST product Life “Not Known”>• By default the product’s Life = ‘Not Known’. If specified

then the specified value is used. Divakaran Liginlal

Type of attr. is character string

Page 14: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

14

Attribute Types – CDATA & ID<!ATTLIST elementname attrname Type DEFAULT>

Eg: <!ATTLIST Idea id ID #REQUIRED> <!ATTLIST school id ID #REQUIRED> Format of specifying an ID = (Letter | '_' | ':') (Char)*<Idea id=‘L1013’> <!ATTLIST Course CrsCode ID #REQUIRED><Course CrsCode=“4413”>

Eg: <!ATTLIST Idea name CDATA #REQUIRED><Idea name=‘Bug Killer’>

<!ATTLIST education school CDATA #REQUIRED><education school=“York University”>

Character String. CDATA indicates that an attribute contains a simple character string of text.

CDATA

Unique Name (identifier) inside the document. Only one attribute of type ID can be assigned to a given element type. The value of the attribute (i.e., the ID) must be unique throughout the same XML document, i.e., it uniquely identifies an element in the XML document,

ID

MeaningType

Page 15: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

15

Attribute Types IDREF<!ATTLIST elementname attrname Type DEFAULT>

Indirect reference to an ID type<!ATTLIST CrsTaken CrsCode IDREF #REQUIRED>

<CrsTaken CrsCode=‘4413’>. CrsCode refers to an element (Course) that has an attribute CrsCode with value ‘4413’.

Reference to an element with an ID attribute having the same value (but not necessarily the same name!) as this IDREF attribute. (i.e., the value of the IDREF attribute must match the value of an ID attribute elsewhere in the same XML document.)

IDREFMeaningType

Page 16: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

16

Attribute Types IDREFS

<!ATTLIST elementname attrname Type DEFAULT>

<!ATTLIST ClassRoster Members IDREFS #IMPLIED><ClassRoster Members=“cs123456 cs234567 cs345678”> –student ids who are registered in some class. Ideally, these students should be listed with elements of type<!ATTLIST Student StudId ID #REQUIRED>, e.g., <Student StudId=“cs123456”><Student StudId=“cs234567 ”><Student StudId=“cs345678”>

Series of IDREFs delimited by whitespace

IDREFSMeaningType

Divakaran Liginlal

Page 17: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

17

Limitations of DTDs

• DTDs do not support namespaces. All element names are global: can’t have one Name type for people and another for companies:

<!ELEMENT Name (Last, First)><!ELEMENT Name (#PCDATA)>

both cannot be in the same DTD

• Very limited assortment of data types (just strings)• Cannot express unordered contents conveniently.

For example, <!ELEMENT Report (Students, Classes, Courses)>

determines that Students, Classes, Courses should appear in the order specified and not any other order.

Page 18: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

18

DTD validation• Once you have a DTD, you can create a

XML document from that DTD. • Then you (may) want to validate the

document against the DTD.• To do so you can write a program that

parses the document and tries to match it against the DTD (Difficult!), or

• Can use a DTD validation tool.

Page 19: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

19

DTD validation - tools• XSV validator (W3C) :

– Free– http://www.w3.org/2001/03/webdata/xsv

• Brown University’s STG (Scholarly Technology Group) validator.– Free– http://www.stg.brown.edu/service/xmlvalid/

• XMLStarlet– Free– http://xmlstar.sourceforge.net/

• Search the web for more tools (there are many)…

Page 20: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

20

XML Schema (XSD)• http://www.w3.org/2001/XMLSchema• Came to rectify some of the problems with DTDs• Advantages:

– Integrated with namespaces– Many built-in types– User-defined types– Powerful key and referential constraints– The schema itself is a XML document.

• Disadvantages:– much more complex than DTDs

Page 21: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

21

XML Documents + XML Schema

<!-- Some XML Schema --><element name = “a" ><complexType> <sequence><element name=“b“

type=“string" minOccurs=“1"/>

<element name=“c" type="integer" maxOccurs="1" />

</sequence></complexType>

</element>

<!-- XML Data--><a>

<b> Some </b><c> 100 </c><c> 101 </c>

</a>

<!-- XML Data--><a>

<b> Some </b><b> Thing </b>

</a>

Not Valid!

Valid

Page 22: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

22

Example

<Age>20009</Age>XML

<element name=“Age” type=“integer”/>XML Schema

<!ELEMENT Age (#PCDATA)>DTD

Page 23: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

23

Motivation for XML Schemas• datatype capability

• For example you can define <price> element to hold an integer with a range of 0 to 12,000

• Datatypes compatible with those in databases– XML Schemas supports relatively many

datatypes• Can create your own datatypes

– Example: "This is a new type based on the string type and elements of this type must follow this pattern: ddd-dddd, where 'd' represents a digit".

Page 24: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

24

Types•• Primitive typesPrimitive types: decimal, integer, boolean, string,

etc. (defined in XMLSchema namespace, http://www.w3.org/TR/xmlschema-2/#built-in-datatypes) – string – string type– boolean – boolean type– integer, decimal, float, double – number types– time, date, month, year, century, etc– date and time

types.

All the above used as in xsd:type, e.g., xsd:integer.e.g.: <xsd:element name = “name” type = “xsd:string”/><xsd:attribute name=“retired” type=“xsd:boolean”/>

Page 25: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

25

E.g.<name>Clive Lloyd</name><birthday>1950-03-29</ birthday >

XML Schema definition<xsd:element name=“name” type=“xsd:string”/><xsd:element name=“birthday” type=“xsd:date”/>

The name of this element

The type of this element

Page 26: Extensible Markup Language Lecture Notes/xml 03.pdf · 2 Document Type Definition (DTD) •A DTD is a grammar specification for an XML document • DTDs are optional – don’t need

26

Custom types (user defined)

• XSD allows you to create your own custom primitive (simple) datatypes.

• Example:<xsd:simpleType name=“TenToTwentyType”>

<xsd:restriction base=“xsd:integer”><xsd:minInclusive value=“10”/><xsd:maxInclusive value=“20”/>

</xsd:restriction><xsd:simpleType>

• Besides minInclusive and maxInclusive, as can also use minExclusive and maxExclusive, minLength and maxLength (for string). These are called usually “facets” in XSD. Also there are the facets precision and scale that allow you to control how many floating point digits will be allowed in floating point numbers.

• restriction is used to restrict the range of values.