xml advanced topics › ~offutt › › classes › 642 › slides › 642lec09a-xml… · • dtds...

29
1 XML Advanced Topics XML Advanced Topics Jeff Offutt http://www.cs.gmu.edu/~offutt/ SWE 642 Software Engineering for the World Wide Web sources: Professional Java Server Programming, Patzer, Wrox, 2 nd edition, Ch 5, 6 Programming the World Wide Web, Sebasta, Ch 8, Addison-Wesley Michelle Lee and Ye Wu 1. Document Type Definitions (DTD) 2 Schemas Topics Topics 2. Schemas 3. Parsing XML documents 4. Summary 24-Oct-13 © Offutt 2

Upload: others

Post on 25-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

1

XML Advanced TopicsXML Advanced Topics

Jeff Offutt

http://www.cs.gmu.edu/~offutt/

SWE 642Software Engineering for the World Wide Web

sources: Professional Java Server Programming, Patzer, Wrox, 2nd edition, Ch 5, 6Programming the World Wide Web, Sebasta, Ch 8, Addison-WesleyMichelle Lee and Ye Wu

1. Document Type Definitions (DTD)2 Schemas

TopicsTopics

2. Schemas3. Parsing XML documents4. Summary

24-Oct-13 © Offutt 2

Page 2: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

2

Document Type Definitions (DTD)Document Type Definitions (DTD)• A DTD (Document Type Definition) provides a grammar

that tells which data structures can occur, in what sequences

• The specification tells you how to write the high-level code that processes the data elements

• DTDs give a set of rules, called declarations, that defines the structure of an XML document

• DTDs allow XML documents to be validated

24-Oct-13 © Offutt 3

– Specific syntax

• DTDs were the original way to define XML grammar– Mostly replaced by XML Schemas

DTD SyntaxDTD Syntax• A DTD is a sequence of declarations enclosed in a

DOCTYPE declaration– <!keyword …>

• Building blocks :– Element type declarations (ELEMENT) : Defines XML tags– Attribute list declarations (ATTLIST) : Attributes for a tag– Entity declarations : &lt; &gt; &amp; &quot; &apos;– PCDATA : Parsed character data – will be examined by the

24-Oct-13 © Offutt 4

parser and entities will be expanded• Should not contain &, <, > characters

– CDATA : Character data – will not be parsed – Notation declarations : Data type notations (uncommon)

Page 3: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

3

DTD ExampleDTD Example<!DOCTYPE book-entry [<!ELEMENT book-type (mystery|scienceFiction|romance)><!ELEMENT author-name (#PCDATA)><!ELEMENT title (#PCDATA)><!ELEMENT title (#PCDATA)><!ELEMENT lostBook EMPTY> ]>

<book-entry><author-name>Arthur C. Clarke</author-name><title>Childhood’s End</title>

24-Oct-13 © Offutt 5

<book-type>scienceFiction</book-type><lostBook/>

</book-entry>

DTD ElementsDTD Elements• Borrows some syntax from BNF

– <!ELEMENT element-name category>– <!ELEMENT element-name (names of children) >– <!ELEMENT memo (from, to, data, re, body) >

• ModifiersModifiers– + : one or more occurrences– * : zero or more occurrences– ? : zero or one occurrence– <!ELEMENT person (parent+, age, spouse?, sibling* ) >

• Leaf nodes have no children– PCDATA : parsable character data– EMPTY : no content– ANY : no restriction, any kind of parsable data– <!ELEMENT from (#PCDATA) >

• Either / or content– <!ELEMENT note (from, to, data, re, (message | body)>

• Mixed content– <!ELEMENT note (#PCDATA|from|to|data|message>

24-Oct-13 © Offutt 6

Page 4: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

4

DTD DTD ExampleExample

library-catalog.xml

<!DOCTYPE library-catalog [<!ELEMENT library-catalog ANY><!ELEMENT book-entry (title, branch)><!ELEMENT title (#PCDATA)><!ELEMENT branch (#PCDATA)><!ENTITY hcm “Howard County Main Branch”>]>

<library-catalog><book entry>

24-Oct-13 © Offutt 7

<book-entry><title>The Trial</title><branch> &hcm; </branch>

</book-entry></library-catalog>

1. Document Type Definitions (DTD)2 Schemas

TopicsTopics

2. Schemas3. Parsing XML documents4. Summary

24-Oct-13 © Offutt 8

Page 5: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

5

Need for XML SchemaNeed for XML Schema• Problems with DTD

– Different DTD files CANNOT be mixed or reused– DTDs do not follow the XML style– DTDs do not support data types

• Solution– XML Schemas– Replacing DTDs

24-Oct-13 © Offutt 9

XML SchemaXML Schema• The purpose of an XML Schema is to define the legal

building blocks of an XML document, just like a DTD• An XML document that conforms to a schema is said to

be an instance of the schema• Schemas have two purposes

– Describe structure– Define data types of elements and attributes

• An XML Schema defines:

24-Oct-13 © Offutt 10

– elements that can appear in a document – attributes that can appear in elements– child elements

Page 6: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

6

Schema Can DefineSchema Can Define• The sequence in which the child elements can appear • The number of child elements• Whether an element is empty or can include textWhether an element is empty or can include text• Data types for elements and attributes• Default values for elements and attributes

24-Oct-13 © Offutt 11

Schemas Replace DTDsSchemas Replace DTDs• Easier to learn than DTDs• Easy to extend• Richer and more useful than DTDs• Supports data types

– Describes valid document content– Can validate the correctness of data– Works with data from a database– Defines data facets (restrictions on data values)– Defines data patterns (data formats)– Converts data between different data types

• Supports namespaces• Written in XML24-Oct-13 © Offutt 12

Page 7: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

7

Advantages of Using XML SyntaxAdvantages of Using XML Syntax• No need to learn another language

• XML editor can be used to edit Schema files

• XML parsers can parse Schema files

• Manipulate Schema with the XML DOM

• Transform Schema with XSLT

24-Oct-13 © Offutt 13

SchemasSchemas——XML for Book ExampleXML for Book Example<books>

<book><ISBN>0471043281</ISBN><title>The Art of Software Testing</title>

XML d fi d b

<author>Glen Myers</author><publisher>Wiley</publisher><price>50.00</price><year>1979</year>

</book></books>

• XML messages are defined by grammars– Schemas and DTDs

• Schemas can define many kinds of types• Schemas include “facets,” which refine the grammar 24-Oct-13 © Offutt 14

Page 8: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

8

XML Schema for Book ExampleXML Schema for Book Example<xs:element name = “books”>

<xs:complexType><xs:sequence>

<xs:element name = “book” maxOccurs = “unbounded”><xs:complexType>

facets

<xs:complexType><xs:sequence>

<xs:element name = “ISBN” type = “xs:string”/><xs:element name = “title” type = “xs:string”/><xs:element name = “author” type = “xs:string”/><xs:element name = “publisher” type = “xs:string”/><xs:element name = “price” type = “priceType”/><xs:element name = “year” type = “xs:int”/>

24-Oct-13 © Offutt 15

</xs:sequence></xs:complexType>

</xs:element></xs:sequence></xs:complexType>

</xs:element>

<xs:simpleType name = “priceType”><xs:restriction base = “xs:decimal”>

<xs:fractionDigits value = “2” /><xs:maxInclusive value = “1000.00” />

</xs:restriction></xs:simpleType>

XML XML <schema><schema> ElementElement• <schema> is the root element of every XML Schema• Attributes to <schema> can define “namespaces”

– A namespace in XML is a URI that defines elements and pattributes that can be used in an XML document

• Typical schema declaration:<xs:schemaxmlns:xs=“http://www.w3.org/2001/XMLSchema“targetNamespace=“http://www w3schools com“targetNamespace= http://www.w3schools.comxmlns=“http://www.w3schools.com“elementFormDefault=“qualified“>...</xs:schema>

24-Oct-13 © Offutt 16

Page 9: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

9

XML XML SSchemaschemas Must Be WellMust Be Well--formedformed• A well-formed XML document is a document that conforms to

the XML syntax rules:

– Must begin with the XML declarationMust have one unique root element– Must have one unique root element

– All start tags must match end-tags– XML tags are case sensitive– All elements must be closed– All elements must be properly nested– All attribute values must be quoted– XML entities must be used for special characters

• Just like XML, well-formed schemas may not be valid

24-Oct-13 © Offutt 17

Just like XML, well formed schemas may not be valid– may still contain errors

Schema Schema MotivationMotivationData CommunicationData Communication

• When data is sent from a sender to a receiver it is essential that both parts have the same “expectations” p pabout the content

• An XML element with a data type like this :<date type=“date”>2003-01-09</date>

ensures a mutual understanding of the content because the XML

24-Oct-13 © Offutt 18

ensures a mutual understanding of the content because the XML data type date requires the format CCYY-MM-DD

Page 10: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

10

Schema Data TypesSchema Data Types• Simple and complex types

– Simple : content is restricted to strings, no attributes or nesting– Complex : attributes and nesting allowed

• More than 40 simple types– Primitive : string, Boolean, float, time, anyURI– Derived builtin : byte, long, decimal, unsignedInt, positiveInteger,

NMTOKEN

24-Oct-13 © Offutt 19

– …

Schema Data Types (2)Schema Data Types (2)• User-defined : Specify restrictions on an existing type

– Called derived types– Specify restrictions in terms of facets

• Facets are ways to restrict values– Integer facets : totalDigits, maxInclusive, maxExclusive,

minInclusive, minExclusive, pattern, enumeration, whitespace

• Facets are based on regular expressions, and the possibilities are endless

24-Oct-13 © Offutt 20

• Tree hierarchy of predefined data types :http://www.w3.org/TR/xmlschema-2/#built-in-datatypes

Page 11: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

11

Common BuiltCommon Built--in in XML Schema XML Schema Data TypesData Types

• xs:string – strings surrounded by quotesxs string strings surrounded by quotes• xs:decimal – numbers with decimal points• xs:integer – integer values• xs:boolean – true and false• xs:date – YYYY/MM/DD

24-Oct-13 © Offutt 21

• xs:time – HH:MM:SS

XML Restrictions (Facets)XML Restrictions (Facets)• Restrictions on XML elements are called facets• Restrictions can be on :

– ValuesI l I l E l E l• minInclusive, maxInclusive, minExclusive, maxExclusive

– A set of values• enumeration

– A series of values• pattern

– Whitespace characters

24-Oct-13 © Offutt 22

• whitespace

– Length• length , maxLength, minLength, totalDigits, fractionDigits

Page 12: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

12

Example XML Integer FacetsExample XML Integer Facets

<xs:element name=“age”><xs:simpleType>p yp

<xs:restriction base=“xs:integer”><xs:minInclusive value=“16”/><xs:maxInclusive value=“34”/><xs:maxLength value=“2”/>

</xs:restriction>

24-Oct-13 © Offutt 23

</xs:restriction></xs:simpleType>

</xs:element>

XML Facets (2)XML Facets (2)

<xs:simpleType name=“isbnType”><xs:union>

<xs:restricti n b se “xs:strin ”>

Composed of numeric digits

between 0 and 9

Restrictions on Strings

<xs:restriction base= xs:string”><xs:pattern value=“[0-9]{10}”/>

</xs:restriction><xs:simpleType>

<xs:restriction base=“xs:NMTOKEN”><xs:enumeration value=“TBD”/>

Exactly 10 digits

Union of 2 separately

24-Oct-13 © Offutt 24

<xs:enumeration value= TBD /><xs:enumeration value=“NA”/>

</xs:restriction></xs:simpleType>

</xs:union> </xs:simpleType>

separately defined rules

Additional values for isbnType

Name characters (letters, ., -, _, :)

Page 13: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

13

XML Facets (3)XML Facets (3)Restrictions on Enumerated Values

<xs:element name=“car”><xs:simpleType>xs simpleType

<xs:restriction base=“xs:string”><xs:enumeration value=“Hyundai”/><xs:enumeration value=“Toyota”/><xs:enumeration value=“Volvo”/>

</xs:restriction>

24-Oct-13 © Offutt 25

</xs:simpleType></xs:element>

These and only these 3 values

Final TypesFinal Types• We might want to control derivation performed on a data

type • Schema supports this through the final attribute in

– xs:complexType– xs:simpleType– xs:element

• Values :– restriction – no derivation by restriction– extension – no derivation by extensionextension no derivation by extension– #all – no derivation at all

24-Oct-13 © Offutt 26

Page 14: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

14

Final TypesFinal TypesThe following forbids any derivation of characterType

<<xs:complexType name=“characterType” final=“#all”><xs:sequence>xs sequence

<xs:element name=“name” type=“nameType”/><xs:element name=“since” type=“sinceType”/><xs:element name=“qualification” type=“descType”/>

</xs:sequence></xs:complexType>

24-Oct-13 © Offutt 27

Schema Data Types ExamplesSchema Data Types Examples

<xs:simpleType name=“pin”>i i b “ i ”<xs:restriction base=“xs:string”>

<xs:length value=“5”></xs:restriction>

</xs:simpleType>

E tl 5 di it

24-Oct-13 © Offutt 28

Exactly 5 digits long

Page 15: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

15

Complex ElementsComplex Elements• Complex elements can contain other elements and

attributes• The sequence is a “compositor” that defines an ordered

f b lsequence of sub-elements

<xs:element name=“book”><xs:complexType>

<xs:sequence>.../...

24-Oct-13 © Offutt 29

</xs:sequence>.../...

</xs:complexType></xs:element>

Complex Complex ElementsElements• Element-only : Other nested elements, but no text

• Text-only : Text, no nested elementsy ,

• Mixed content : Nested elements and text

• Empty : No content

24-Oct-13 © Offutt 30

Page 16: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

16

Schema Complex Schema Complex ElementElement

<xs:complexType name=“Address”><xs:sequence>

<xs:element name=“street1” type=“xs:string” />

Elements must appear in this order

<xs:element name=“street2” type=“xs:string”minOccurs=“0” maxOccurs=“unbounded”/>

<xs:element name=“city” type=“xs:string” /><xs:element name=“state” type=“xs:string” /><xs:element name=“zip” type=“xs:decimal” />

</xs:sequence>/ l T Could be further

24-Oct-13 © Offutt 31

</xs:complexType>

<xs:element name = “mailingAddress” type=“Address”><xs:element name = “billingAddress” type=“Address”>

Could be further constrained as a simple type

XML ExampleXML Example

This XML is valid by the previous schema definition

<? xml version = “1.0”?><Address>

<street1>4400 University Blvd</street1><street2>MS 4A5</street2><city>Fairfax</city>

24-Oct-13 © Offutt 32

city Fairfax /city<state>Virginia</state><zip>22030</zip>

</Address>

Page 17: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

17

ValidationValidation• Schemas can be used to validate XML documents

– By validating XML parsers– By standalone tools such as xsv

http://www.ltg.ed.ac.uk/~ht/xsv-status.html

• An XML document must be correct according to the XML syntax rules

24-Oct-13 © Offutt 33

• An XML document may be valid according to a particular schema or DTD

XML Example : EMailXML Example : EMail

<?xml version=“1.0”?> ?xm ers on . ? <note>

<to>George Burdell</to> <from>Buzz</from> <heading>Reminder</heading> b d H k d /b d

24-Oct-13 © Offutt 34

<body>How was your weekend?</body></note>

Page 18: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

18

DTD Example for EMailDTD Example for EMail<?xml version=“1.0”?> <note>

<to>George Burdell</to> <from>Buzz</from> <heading>Reminder</heading>

<!ELEMENT note (to, from, heading, body)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

heading Reminder /heading <body>How are your?</body>

</note>

24-Oct-13 © Offutt 35

<!ELEMENT from (#PCDATA)>

<!ELEMENT heading (#PCDATA)>

<!ELEMENT body (#PCDATA)>

XML Schema for EMailXML Schema for EMail<!ELEMENT note (to, from, heading, body)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT heading (#PCDATA)>

<!ELEMENT body (#PCDATA)>

<?xml version=“1.0”?> <note>

<to>George Burdell</to> <from>Buzz</from> <heading>Reminder</heading> <body>How are your?</body><body>How are your?</body>

</note><?xml version=“1.0”?><xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema>

<xs:element name=“note”><xs:complexType>

<xs:sequence><element name=“to” type=“xs:string”/>

l t “f ” t “ t i ”/

24-Oct-13 © Offutt 36

<element name=“from” type=“xs:string”/><element name=“heading” type=“xs:string”/><element name=“body” type=“xs:string”/>

</xs:sequence> </xs:complexType>

</xs:element></xs:schema>

Page 19: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

19

ReferencingReferencing XML XML Schema for Schema for EMailEMail<?xml version=“1.0”?><xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema>

<xs:element name=“note”><xs:complexType>

<xs:sequence><element name=“to” type=“xs:string”/><element name= to type= xs:string /><element name=“from” type=“xs:string”/><element name=“heading” type=“xs:string”/><element name=“body” type=“xs:string”/>

</xs:sequence> </xs:complexType>

</xs:element></xs:schema> <?xml version=“1.0”?>

<note xmlns:xsi= “http://www.w3.org/2001/XMLSchema-instance”N h L

24-Oct-13 © Offutt 37

xsi:noNamespaceSchemaLocation=“http://www.w3schools.com/schema/note.xsd”>

<to>George Burdell</to><from>Buzz</from><heading>Reminder</heading><body>How are you?</body>

</note>

XML

XML in a Large Web SiteXML in a Large Web Site

XML

XML

XMLProcess

XMLRender

24-Oct-13 © Offutt 38

XMLDocuments

XMLDocuments

Page 20: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

20

1. Document Type Definitions (DTD)2 Schemas

TopicsTopics

2. Schemas3. Parsing XML documents4. Summary

24-Oct-13 © Offutt 39

Parsing XML with JavaParsing XML with Java

S l XML • Several XML parsers exist

• They can be downloaded for free

• They read an XML file and put the contents in a tree

24-Oct-13 © Offutt 40

whose elements can be accessed through APIs

Page 21: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

21

Dynamic XML ProcessingDynamic XML Processing• Using XML to represent data increases the portability of

data– Reduces errors– Enhances reusability

• SAX and DOM parsers help to parse and organize XML documents– DOM

• Easier to use, ideal for interactive appsB ld N d

24-Oct-13 © Offutt 41

• Builds an in-memory tree : Needs more memory

– SAX• Event-driven, serial-access• Used for high performance apps

XML Parser XML Parser –– DOM ParserDOM Parser

24-Oct-13 © Offutt 42

Page 22: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

22

DOM ObjectsDOM Objects• DOMImplementation

Allows the code to find out about available support for various features

• NodeB bj f ll d i h DOMBase object for all nodes in the DOM

• NodeListOrdered collection of Node objects

• NameNodeMapCollection of Node objects that may be accessed by name

• DOMException

24-Oct-13 © Offutt 43

Principal error-handling object for the DOM

Node ObjectNode Object• Attributes

nodeName, nodeValue, nodeType, parentNode, childNodes, firstChild, lastChild, previousSibling, p gnextSibling, attributes, ownerDocument

• MethodsinsertBefore(), replaceChild(), removeChild(), appendChild(), hasChildNodes()

24-Oct-13 © Offutt 44

Page 23: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

23

XML XML ParserParser——SAXSAXSAX is a Java program that allows applications to integrate with any XML parser to receive notification of parsing events– Obtain a parser object– Obtain XML data– Ask the parser to parse the XML

24-Oct-13 © Offutt 45

XML XML ParserParser——SAX SAX OverviewOverview• SAX walks through an XML document• When important items are found, the

DocumentHandler object is notifiedj– Document start– Element start– Element ends– Character string found– …

• SAX provides standard names for callback functions

24-Oct-13 © Offutt 46

Page 24: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

24

XML XML ParserParser——SAX SAX APIsAPIs

DocumentHandler

SAXSAXParserParserFactoryFactory

ErrorHandler

DTD / SchemaHandler

SAXSAXPParserarser

24-Oct-13 © Offutt 47

EntityResolver

XML Parser XML Parser –– SAX APIs (2)SAX APIs (2)• SAXParserFactory

– A SAXParserFactory object creates an instance of the parser determined by the system property :

l SAXP Fjavax.xml.parsers.SAXParserFactory

• Parser– The org.xml.sax.Parser interface defines methods like

setDocumentHandler() to set up event handlers and parse(URL) to do the parsing

24-Oct-13 © Offutt 48

do the parsing– This interface is implemented by the Parser and ValidatingParser

classes in the com.sun.xml.parser package

Page 25: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

25

XML Parser XML Parser –– SAX APIs (3)SAX APIs (3)• DocumentHandler

– The methods:• startDocument ()• endDocument ()• endDocument ()• startElement ()• endElement ()

are called when an XML tag is recognized– The methods:

• Characters ()• processingInstruction ()

24-Oct-13 © Offutt 49

are called when the parser encounters the text in an XML element or an inline processing instruction

XML XML ParserParser——Partial Partial ExampleExample

import org.xml.sax.*import javax.xml.parsers.SAXParserFactory;import javax.xml.parsers.SAXParserFactory;p j p y;

public class CountSax extends HandlerBase…SAXParserFactory factory = SAXParserFactory.newInstance();SAXParser saxParser = factory.newSAXParser();saxParser.parse (“input.xml”, new CountSax());…

p bli id st tD m nt ()

24-Oct-13 © Offutt 50

public void startDocument () …

public void startElement (String name, AttributeList attrs) …

public void endDocument () …

Page 26: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

26

XML XML ParserParser——SAX SAX APIs (4)APIs (4)• ErrorHandler

– Methods error(), fatalError(), and warning() are called in response to parsing errors

– The default error handler throws exceptions for fatal errors and ignores other errors (including validation errors)

• DTDHandler– Handle events that occur while reading and parsing an XML

document’s DTD or schemaDo NOT define events associated with the validation itself

24-Oct-13 © Offutt 51

– Do NOT define events associated with the validation itself

XML XML ParserParser——SAX SAX APIs (5)APIs (5)EntityResolver

– The resolveEntity() method is called when the parser must identify data identified by a URI

– In most cases, a URI is simply a URL, which specifies the location of a document, but in some cases the document may be identified by a URN – a public identifier, or name, that is unique in the web space

– The public identifier may be specified in addition to the URL– The EntityResolver can then use the public identifier instead of

24-Oct-13 © Offutt 52

y pthe URL to find the document, for example to access a local copy of the document if one exists

Page 27: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

27

DOM vs. SAXDOM vs. SAX• DOM parsers load the entire document into memory and

store it in a tree structure

SAX XML d ll d • SAX parsers process XML documents sequentially and notify programs of any interesting events

• SAX parsers have difficulties with random access and modifying the XML documents

24-Oct-13 © Offutt 53

• Programming in DOM and SAX is different

1. Document Type Definitions (DTD)2 Schemas

TopicsTopics

2. Schemas3. Parsing XML documents4. Summary

24-Oct-13 © Offutt 54

Page 28: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

28

When to Use XMLWhen to Use XML• Traditional data processing

– When XML encodes the data for a program to process

• Document-driven programsp g– When XML documents are containers that build interfaces and

applications from existing components

• Archiving– When the customized version of a component is saved

(archived) so it can be used later

24-Oct-13 © Offutt 55

• Binding– When the DTD or schema that defines an XML data structure is

used to automatically generate a significant portion of the application that will eventually process the data

Tips for Using XMLTips for Using XML1. Save work

– When possible, use an existing DTD or Schema– It’s usually easier to ignore what you don’t need– Grammars are notoriously hard to get correctGrammars are notoriously hard to get correct

2. Normalizing data– The process of eliminating redundancies is known as normalizing– Defining entities is helps to normalize your data– The only way to achieve that kind of modularity in HTML is to

link to other documents—which fragments the document

24-Oct-13 © Offutt 56

g– XML entities do not require fragmentation … entity reference is

a macro whose contents are expanded in place– When the entity is defined in an external file, multiple

documents can reference it (introducing fragmentation …)

Page 29: XML Advanced Topics › ~offutt › › classes › 642 › slides › 642Lec09a-XML… · • DTDs were the original way to define XML grammar – Mostly replaced by XML Schemas

29

Tips Tips for Using for Using XML (2)XML (2)3. When to consider defining an entity reference

– Entities are written once and referenced from multiple places– Whenever the same thing is written more than once– If the information is likely to change, especially if it is used in

more than one place– If the entity will never be referenced anywhere outside of the

current file, define it in the local DTD or schema• Much as you would define a method or inner class in a program

– If the entity will be referenced from multiple documents, define

24-Oct-13 © Offutt 57

If the entity will be referenced from multiple documents, define it as an external entity• Much as you would define any generally usable class as an external class

4. Downside of using entity references– They add complexity and reduce readability

Installation on appsInstallation on apps--swe642swe642

Add /data/swe642fall2013/swe642/WEB-INF/lib/xercesImpl-2.8.1.jarinto your CLASSPATH:y

• If your shell is csh: setenv CLASSPATH /data/swe642fall2013/swe642/WEB-INF/lib/xercesImpl-

2.8.1.jar

• If your shell is ksh or bash :

24-Oct-13 © Offutt 58

If your shell is ksh or bash : export CLASSPATH=$CLASSPATH: /data/swe642fall2013/swe642/WEB-

INF/lib/xercesImpl-2.8.1.jar