extensible markup language lecture notes/xml 03.pdf · 2 document type definition (dtd) •a dtd is...

Post on 19-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

XML (3)Extensible Markup Language

Acknowledgements and copyrights: these slides are a result of combination of notes and slides with contributions from: MichaelKiffer, Arthur Bernstein, Philip Lewis, Hanspeter Mφssenbφck, Hanspeter Mφssenbφck, Wolfgang Beer, Dietrich Birngruber,

Albrecht Wφss, Mark Sapossnek, Bill Andreopoulos, DivakaranLiginlal, Anestis Toptsis, Addison Wesley, Microsoft AA.

They serve for teaching purposes only and only for the students that are registered in CSE4413 and should not be published as a book or in any form of commercial product, unless written permission is

obtained from each of the above listed names and/or organizations.

2

Document Type Definition (DTD)

• A DTDDTD is a grammar specification for an XML document

• DTDs are optional – don’t need to be specified

• If specified, DTD can be part of the document (at the top); or it can be given as a URL

• A document that conforms (i.e., parses) w.r.t. its DTD is said to be validvalid

3

XMLData + DTD<!-- XML Data--><a>

<b> Some </b><c> 100 </c><c> 101 </c>

</a>

<!-- XML Data--><a>

<b> Some </b><b> Thing </b>

</a>

Not Valid!DTD

<!ELEMENT a (b+, c?) ><!ELEMENT b (#PCDATA) ><!ELEMENT c (#PCDATA) >

Valid

4

What is a DTD ?

• Document Type Definition (DTD)• Defines the syntax, grammar and semantics • Defines the document structure

– What Elements, Attributes, Entities, etc are permitted?– How are the document elements related and structured?

• Referenced by or defined in XML documents, but it’s not XML!

• Enables validation of XML documents using an XML Parser

• Can be referenced to by more than one XML document

• DTD’s may reference other DTD’s

5

Schemas: DTD Example

• XML document (that conforms to DTD below)

• DTD schema:

<!DOCTYPE BOOK [<!ELEMENT BOOK (TITLE+, AUTHOR) ><!ELEMENT TITLE (#PCDATA) ><!ELEMENT AUTHOR (#PCDATA) >]>

<BOOK><TITLE>All About XML</TITLE><AUTHOR>Joe Developer</AUTHOR>

</BOOK>

6

DTD By Diagram

Customer

FName LName

Address

Address

Address

CustomerOrder

Orders

OrderNo ProductNo

ProductNo

ProductNo

OrderNo ProductNo

ProductNo

Person

Orders

Orders

7

DTD By Example

http://www.myco.com/dtd/order.dtd<?xml version = “1.0” encoding = “UTF-8” ?><!DOCTYPE CustomerOrder [

<!ELEMENT CustomerOrder (Customer, Orders*) >

<!ELEMENT Customer (Person, Address+) ><!ELEMENT Person (FName, LName) ><!ELEMENT FName (#PCDATA) ><!ELEMENT LName (#PCDATA) ><!ELEMENT Address (#PCDATA) ><!ATTLIST Address

AddrType ( billing | shipping | home ) “shipping” >

<!ELEMENT Orders (OrderNo, ProductNo+) ><!ELEMENT OrderNo (#PCDATA) ><!ELEMENT ProductNo (#PCDATA) >]>

0 or more times

Exactly 1 time

1 or more timesParsed Character

Data

Or (choice)

Default value

8

DTD syntax rules• , (comma) indicates strict sequence. For example,

<!ELEMENT CustomerOrder (Customer, Orders) > • | (pipe) indicates option. For example,

AddrType ( billing | shipping | home )• Cardinality: How many instances allowed

– ? Optional (may or may not appear)– * Zero or more– + One or more

<!ELEMENT Category (subcategory+ ) >• #PCDATA stands for parsed character data • Mixed Content can be indicated using #PCDATA. For example,

<!ELEMENT Idea (#PCDATA | product | service)* >Eg. <Idea>

<product> … </product>Some descriptive text included as pcdata

<service> …. </service><product> …. </product>

</Idea>

9

DTDs (cont’d)

• A DTD can be specified as part of a XML document:

<?xml version=“1.0” ?><!DOCTYPE Report [

… … …]><Report> … … … </Report>

• A DTD can be specified as a standalone file, and used inside a XML document.

<?xml version=“1.0” ?><!DOCTYPE Report http://foo.org/Report.dtd”><Report> … … … </Report>

10

DTD Example<!DOCTYPE Report [

<!ELEMENT Report (Students, Classes, Courses)><!ELEMENT Students (Student*)><!ELEMENT Classes (Class*)><!ELEMENT Courses (Course*)><!ELEMENT Student (Name, Status, CrsTaken*)><!ELEMENT Name (First,Last)><!ELEMENT First (#PCDATA)>… … …<!ELEMENT CrsTaken EMPTY><!ELEMENT Class (CrsCode,Semester,ClassRoster)><!ELEMENT Course (CrsName)>… … …<!ATTLIST Report Date CDATA #IMPLIED><!ATTLIST Student StudId ID #REQUIRED><!ATTLIST Course CrsCode ID #REQUIRED><!ATTLIST CrsTaken CrsCode IDREF #REQUIRED><!ATTLIST ClassRoster Members IDREFS #IMPLIED>

]>Exercise: Use the above DTD to write a conforming XML document of your

choice.

Zero or more

Has text content (PCDATAstands for Parsed Character

Data)

Empty element, no content. Neither text nor child elements.

Like <BR/> in HTML.(but it may have attributes).

Same attribute in different elements

11

DTD- ELEMENT content specifications

<!ELEMENT Tagnamecontent_specification>

• Content_Specification: – EMPTY: Neither text nor child elements associated

• like <BR/> or <Link url=‘http://abc.d’/> in HTML.– ANY: Content that does not violate well-formed syntax – MIXED: Mix of elements, #PCDATA, or text

12

DTD- Attribute Declarations<!ATTLIST elementname attrname Type DEFAULT>

Required always in an instance of this element

#REQUIRED

The default value of the attribute. If the attribute is absent the default value is assumed by the parser.

default value only

Attribute has a fixed value. If this attribute is absent then the default value is assumed by the parser

#FIXED default value

Optional in an instance of this element

#IMPLIED

MeaningDefault

Divakaran Liginlal

13

Examples: DTD- Attribute Declaration<!ATTLIST product Name CDATA #REQUIRED>

• A product must have a name and it is a character data string

<!ATTLIST product Life #IMPLIED>• A product may have a specified Life

<!ATTLIST product Life #FIXED “Not Known”>• By default the product’s Life = ‘Not Known’ and is

constant. If attribute Life is absent, then still the attribute Life will be assumed to have this value.

<!ATTLIST product Life “Not Known”>• By default the product’s Life = ‘Not Known’. If specified

then the specified value is used. Divakaran Liginlal

Type of attr. is character string

14

Attribute Types – CDATA & ID<!ATTLIST elementname attrname Type DEFAULT>

Eg: <!ATTLIST Idea id ID #REQUIRED> <!ATTLIST school id ID #REQUIRED> Format of specifying an ID = (Letter | '_' | ':') (Char)*<Idea id=‘L1013’> <!ATTLIST Course CrsCode ID #REQUIRED><Course CrsCode=“4413”>

Eg: <!ATTLIST Idea name CDATA #REQUIRED><Idea name=‘Bug Killer’>

<!ATTLIST education school CDATA #REQUIRED><education school=“York University”>

Character String. CDATA indicates that an attribute contains a simple character string of text.

CDATA

Unique Name (identifier) inside the document. Only one attribute of type ID can be assigned to a given element type. The value of the attribute (i.e., the ID) must be unique throughout the same XML document, i.e., it uniquely identifies an element in the XML document,

ID

MeaningType

15

Attribute Types IDREF<!ATTLIST elementname attrname Type DEFAULT>

Indirect reference to an ID type<!ATTLIST CrsTaken CrsCode IDREF #REQUIRED>

<CrsTaken CrsCode=‘4413’>. CrsCode refers to an element (Course) that has an attribute CrsCode with value ‘4413’.

Reference to an element with an ID attribute having the same value (but not necessarily the same name!) as this IDREF attribute. (i.e., the value of the IDREF attribute must match the value of an ID attribute elsewhere in the same XML document.)

IDREFMeaningType

16

Attribute Types IDREFS

<!ATTLIST elementname attrname Type DEFAULT>

<!ATTLIST ClassRoster Members IDREFS #IMPLIED><ClassRoster Members=“cs123456 cs234567 cs345678”> –student ids who are registered in some class. Ideally, these students should be listed with elements of type<!ATTLIST Student StudId ID #REQUIRED>, e.g., <Student StudId=“cs123456”><Student StudId=“cs234567 ”><Student StudId=“cs345678”>

Series of IDREFs delimited by whitespace

IDREFSMeaningType

Divakaran Liginlal

17

Limitations of DTDs

• DTDs do not support namespaces. All element names are global: can’t have one Name type for people and another for companies:

<!ELEMENT Name (Last, First)><!ELEMENT Name (#PCDATA)>

both cannot be in the same DTD

• Very limited assortment of data types (just strings)• Cannot express unordered contents conveniently.

For example, <!ELEMENT Report (Students, Classes, Courses)>

determines that Students, Classes, Courses should appear in the order specified and not any other order.

18

DTD validation• Once you have a DTD, you can create a

XML document from that DTD. • Then you (may) want to validate the

document against the DTD.• To do so you can write a program that

parses the document and tries to match it against the DTD (Difficult!), or

• Can use a DTD validation tool.

19

DTD validation - tools• XSV validator (W3C) :

– Free– http://www.w3.org/2001/03/webdata/xsv

• Brown University’s STG (Scholarly Technology Group) validator.– Free– http://www.stg.brown.edu/service/xmlvalid/

• XMLStarlet– Free– http://xmlstar.sourceforge.net/

• Search the web for more tools (there are many)…

20

XML Schema (XSD)• http://www.w3.org/2001/XMLSchema• Came to rectify some of the problems with DTDs• Advantages:

– Integrated with namespaces– Many built-in types– User-defined types– Powerful key and referential constraints– The schema itself is a XML document.

• Disadvantages:– much more complex than DTDs

21

XML Documents + XML Schema

<!-- Some XML Schema --><element name = “a" ><complexType> <sequence><element name=“b“

type=“string" minOccurs=“1"/>

<element name=“c" type="integer" maxOccurs="1" />

</sequence></complexType>

</element>

<!-- XML Data--><a>

<b> Some </b><c> 100 </c><c> 101 </c>

</a>

<!-- XML Data--><a>

<b> Some </b><b> Thing </b>

</a>

Not Valid!

Valid

22

Example

<Age>20009</Age>XML

<element name=“Age” type=“integer”/>XML Schema

<!ELEMENT Age (#PCDATA)>DTD

23

Motivation for XML Schemas• datatype capability

• For example you can define <price> element to hold an integer with a range of 0 to 12,000

• Datatypes compatible with those in databases– XML Schemas supports relatively many

datatypes• Can create your own datatypes

– Example: "This is a new type based on the string type and elements of this type must follow this pattern: ddd-dddd, where 'd' represents a digit".

24

Types•• Primitive typesPrimitive types: decimal, integer, boolean, string,

etc. (defined in XMLSchema namespace, http://www.w3.org/TR/xmlschema-2/#built-in-datatypes) – string – string type– boolean – boolean type– integer, decimal, float, double – number types– time, date, month, year, century, etc– date and time

types.

All the above used as in xsd:type, e.g., xsd:integer.e.g.: <xsd:element name = “name” type = “xsd:string”/><xsd:attribute name=“retired” type=“xsd:boolean”/>

25

E.g.<name>Clive Lloyd</name><birthday>1950-03-29</ birthday >

XML Schema definition<xsd:element name=“name” type=“xsd:string”/><xsd:element name=“birthday” type=“xsd:date”/>

The name of this element

The type of this element

26

Custom types (user defined)

• XSD allows you to create your own custom primitive (simple) datatypes.

• Example:<xsd:simpleType name=“TenToTwentyType”>

<xsd:restriction base=“xsd:integer”><xsd:minInclusive value=“10”/><xsd:maxInclusive value=“20”/>

</xsd:restriction><xsd:simpleType>

• Besides minInclusive and maxInclusive, as can also use minExclusive and maxExclusive, minLength and maxLength (for string). These are called usually “facets” in XSD. Also there are the facets precision and scale that allow you to control how many floating point digits will be allowed in floating point numbers.

• restriction is used to restrict the range of values.

top related