xml – extensible markup language

113
XML – Extensible Markup Language

Upload: oberon

Post on 19-Mar-2016

51 views

Category:

Documents


1 download

DESCRIPTION

XML – Extensible Markup Language. Objectives. To understand various ways in which XML can be used History of XML Syntax of XML Difference between HTML, XML and XHTML XML Document Type Definitions (DTDs) XML Schemas To understand types of XML Parsers Validating vs. Non-Validating Parsers - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: XML – Extensible Markup Language

XML – Extensible Markup Language

Page 2: XML – Extensible Markup Language

Objectives

To understand various ways in which XML can be used History of XML Syntax of XML Difference between HTML, XML and XHTML XML Document Type Definitions (DTDs) XML Schemas To understand types of XML Parsers

Validating vs. Non-Validating Parsers To understand different XML Parser Interfaces

Tree Based Interface Standard : DOM Event Based Interface Standard : SAX

Evaluating Parsers Which parser to use?

Page 3: XML – Extensible Markup Language

History of XML The World Wide Web Consortium (W3C) is an international consortium where

Member organizations, a full-time staff, and the public work together to develop Web standards

Tim Berners-Lee and others created W3C (1994) Berners-Lee, who invented the World Wide Web in 1989.

• In 1970 IBM Introduced SGML

• SGML: Standard Generalized Markup Language

• SGML is a semantic and structural language for text

documents.

• SGML is complicated.

• XML Working Group is formed under W3C in 1996.

• In 1998 W3C introduced XML 1.0

• Extensible Markup Language (XML) is a subset of SGML

Page 4: XML – Extensible Markup Language

What is XML?

XML stands for eXtensible Markup Language XML is a universal method representing data

Used in applications, web and for data exchange XML is a markup language much like HTML, but used

for different purposes XML is not a replacement for HTML

Page 5: XML – Extensible Markup Language

What is XML? XML was designed to describe data XML is a cross-platform, software and hardware

independent tool for transmitting or exchanging information.

XML is an open-standards-based technology Extensible Both Human and machine readable XML Standard

XML 1.0 (1998). XML 1.1 (Feb 2004)

Page 6: XML – Extensible Markup Language

What Exactly is XML used for?

Storing data in a structured manner. ( Tree structure)

Storing configuration information – typically data in an application which is not stored in a database Most server software have configuration files in

XML formats

Page 7: XML – Extensible Markup Language

Contd…

Transmitting data between applications Overcomes Problems in Client Server applications which are

cross-platform in nature Ex: A Windows program talking to a mainframe

XML is a universal, standardized language used to represent data such that it can be both processed independently and exchanged between programs and applications and between clients and servers

Disparate systems can exchange information in a common format

Page 8: XML – Extensible Markup Language

XML Syntax The syntax rules of XML are very simple and very

strict. XML tags are not predefined. You must define your

own tags

<college>GCET</college> All XML elements must have a closing tag

<para>This is a paragraph</para>

Page 9: XML – Extensible Markup Language

Contd… XML tags are case sensitive

<Msg>This is incorrect</msg> Incorrect <msg>This is correct</msg> Correct

All XML elements must be properly nested <name>Jill<lname>Jack</name></lname>Incorrect <name>Jill<lname>Jack</lname></name> Correct

Attribute values must always be quoted <pen color=red>reynolds</pen> Incorrect <pen color=“red”>reynolds</pen> Correct

Page 10: XML – Extensible Markup Language

XML SyntaxAll XML documents must have a root element

<parent>

<child>

<subchild>.....</subchild>

</child>

</parent>

Page 11: XML – Extensible Markup Language

XML Comments Comments in XML

Comments are similar to HTML <!-- This is a comment --> <?xml version="1.0"?><!–- Customer details --><customer>  <name>John</name>  <email>[email protected]</email>

</customer>

Page 12: XML – Extensible Markup Language

XML Code<?xml version="1.0"?><customers><customer>  <name>John</name> <email>[email protected]</email></customer> <customer>  <name>Tom</name><email/></customer>

</customers>cust.xml

Page 13: XML – Extensible Markup Language

Extensibility in XML A typical XML document is made up of tags

enclosing the data; tag names describe the data

Because the language is extensible, you can create tags that are specific to your need

Page 14: XML – Extensible Markup Language

Contd… For example, your document may contain

tags to structure information about employees   The tags may include <Name>, <Designation>,and <Address>  

Data stored in XML is self-descriptive One can understand the data by just looking at

tag names

Page 15: XML – Extensible Markup Language

XML – Exchanging Info Between Apps Convert information stored in the database

(or any other format) to an XML format Once it is in XML format, other

applications/programs can parse (read) the XML document, which is made up of the initial data

XML parsers are freely available and are part of many new programming languages

Page 16: XML – Extensible Markup Language

Contd…

An Application

Spreadsheet Package

CAD Package

StatisticalProcessing

XMLDatabase

Page 17: XML – Extensible Markup Language

ContentStructure

Presentation

XML Doc

DTD/XSD

XSL

XSD - XML Schema Definition

DTD - Document Type Definition.

XSL - Extensible Stylesheet Language.

Page 18: XML – Extensible Markup Language

Document Type Declaration (DTD)

DTD (Document Type Definition) is used to enforce structure requirements for an XML document

Document type declaration contains reference to Document Type Definition (DTD) and tells the parser which DTD to use for validation

xmldtd.xml

Page 19: XML – Extensible Markup Language

Contd…<?xml version="1.0"?><!DOCTYPE customers [ <!ELEMENT customers (customer)> <!ELEMENT customer (name,email)> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)>]><customers><customer>

<name>John Conlon</name><email>[email protected]</email>

</customer></customers>

Page 20: XML – Extensible Markup Language

XML Schema

An XML based alternative to DTD

Richer and more useful than DTDs

Written in XML and Simpler than DTDs

Support data type validation (DTD does not

support data type validation)add.xml

Page 21: XML – Extensible Markup Language

<?xml version="1.0"?> <addressBook><person> <cname>Harrison Ford</cname> <email>[email protected]</email> </person> <person><cname>Julie</cname> <email>[email protected]</email></person> </addressBook>

Page 22: XML – Extensible Markup Language

<?xml version="1.0"?><xs:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema>

<xs:complexType name="record"> <xs:sequence> <xs:element name="cname" type="xs:string"/>

<xs:element name="email" type="xs:string/>

</xs:sequence> </xs:complexType> <xs:element name="addressBook"> <xs:complexType> <xs:sequence> <xs:element name="person" type="record" minOccurs="0" maxOccurs="unbounded"/>

</xs:sequence> </xs:complexType> </xs:element> </xs:schema>

Page 23: XML – Extensible Markup Language

Simple XML Elements with Pre-defined Data Types

Simple XML Element: An XML element that has no

child elements and attributes. Simple XML elements can

be defined in XSD with the following statement:

<xsd:element name="element_name"

type="xsd:type_name"/>

XSD Syntax

Page 24: XML – Extensible Markup Language

Contd…

where "element_name" is the name of the XML element,

and "type_name" is one of the data type names pre-

defined in XSD.

XSD pre-defined data types are divided into 7 groups: Numeric data types Date and time data types String data types Binary data types Boolean data type

Page 25: XML – Extensible Markup Language

XSD Syntax

Simple XML Elements with Extended Data Types

Simple XML Element: An XML element that has

no child elements and attributes. Simple XML

elements can be defined by using the pre-defined

XSD data types.

Page 26: XML – Extensible Markup Language

They can also be defined by using extended data types, which are defined by "simpleType" statements:

<xsd:simpleType name="my_type_name"> <xsd:restriction base="xsd:type_name"> XSD facet statements </xsd:restriction> </xsd:simpleType> <xsd:element name="element_name" type="my_type_name"/> where "element_name" is the name of the XML element, "xsd:type_name"

is a pre-defined data type serving as the base data type, and "my_type_name" is the new data type extended from the base data type.

Page 27: XML – Extensible Markup Language

Complex XML Elements

Complex XML Element: An XML element that has at least one

child element or at least one attribute. Complex XML elements

must be defined with complex data types, which are defined by

"complexType" statements:

XSD Syntax

Page 28: XML – Extensible Markup Language

<xsd:element name="element_name" type="my_type_name"/> <xsd:complexType name="my_data_type"> <xsd:sequence> <xsd:element name="child_element_1" type="data_type_1"/> <xsd:element name="child_element_2" type="data_type_2"/> ... </xsd:sequence> <xsd:attribute name="attribute_a" type="data_type_a"/> <xsd:attribute name="attribute_b" type="data_type_b"/> ... </xsd:complexType> where "attribute" statement is used to define an attribute, and "sequence"

statement is used to define the group of child elements, and the order the child elements should appear in the XML structure.

Note that "attribute" statements must appear after the child element definition statements.

Page 29: XML – Extensible Markup Language

XSD Syntax

Empty XML Elements Empty XML Element: A special complex XML element

that has one attribute or more and no child text nodes. Empty XML elements must be defined with complex data types in the following format:

<xsd:complexType name="my_data_type"> <xsd:attribute name="attribute_a" type="data_type_a"/> <xsd:attribute name="attribute_b" type="data_type_b"/> ... </xsd:complexType>

Page 30: XML – Extensible Markup Language

XSD Syntax

Anomymous Data Types If data type is specific to a child element in a parent data type,

and there is not need to share it with data types outside the

parent data type, you can define it as anonymous data type - a

non-named data type defined inline. For example, the following

code:

Page 31: XML – Extensible Markup Language

<xsd:complexType name="my_data_type"> <xsd:sequence> <xsd:element name="setting"> <xsd:complexType> <xsd:sequence><xsd:element name="property" type="xsd:string"/> <xsd:element name="value" type="xsd:integer"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType>

defines "my_data_type" which has a "setting" element,

which has an anonymous data type defined inline.

Page 32: XML – Extensible Markup Language

Well-formed XML Documents

A document is made of elements; There is exactly one element, called the root, or document element

For all other elements, the elements, delimited by start- and end-tags, nest properly within each other

Attributes if any, should have their values enclosed within quotes

Page 33: XML – Extensible Markup Language

Valid XML Documents An XML document is valid if it has an

associated DTD or Schema and if the document complies with the constraints expressed in it

If an XML document is valid, it is also well-formed

Page 34: XML – Extensible Markup Language

Document Type Definitions (DTDs) Describes syntax that explains

which elements may appear in the XML document what are the element contents and attributes

Need for DTD Validating parser ( a program) can be used to check whether

XML data adheres to the rules in DTD The parser can do appropriate error handling if there are any

violation Validity error is not necessary a fatal error, but some

applications may treat it as fatal error

Page 35: XML – Extensible Markup Language

Document Type Declarations A valid XML document must include the

reference to DTD which validates it Types of DTD

Internal DTD: DTD can be embedded into XML document

External DTD: DTD can be in a separate file

Page 36: XML – Extensible Markup Language

Internal DTD DTD embedded in the XML document

The declarations appear between [ and ] E.g. AddressBook.xml

AddressBook.xml

Page 37: XML – Extensible Markup Language

<?xml version='1.0' encoding='utf-8'?><!-- DTD for a AddressBook.xml --><!DOCTYPE AddressBook [<!ELEMENT AddressBook (Address+)><!ELEMENT Address (Name, Street, City)><!ELEMENT Name (#PCDATA)><!ATTLIST Name salutation CDATA #REQUIRED><!ELEMENT Street (#PCDATA)><!ELEMENT City (#PCDATA)>]><AddressBook><Address><Name salutation="Mr.">Ram</Name><Street>M G Road</Street><City>Bangalore</City></Address></AddressBook>

Page 38: XML – Extensible Markup Language

External DTD DTD is present in separate file Example

The DTD for AddressBook.xml is contained in a file AddressBook.dtd

AddressBook.xml contains only XML Data with a reference to the DTD file

AddressBook.xml

AddressBook.dtdAddressBook.xml

Page 39: XML – Extensible Markup Language

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE AddressBook SYSTEM "file:///c:/XML/AddressBook.dtd">

<AddressBook><Address>

<Name salutation="Mr.">Ram</Name><Street>M G Road</Street><City>Bangalore</City>

</Address></AddressBook>

Page 40: XML – Extensible Markup Language

Anatomy of DTD – Defining new XML tags (Elements)

<!ELEMENT element_name content_specification> element_name: Specifies name of the XML tag Content_specification: Specifies what are the contents of the

element #PCDATA: Parsed character data (Extra white spaces are

ignored) #CDATA: Character data (White spaces retained as is) Nested elements Empty Any (generally avoided but used in mixed content model)

Page 41: XML – Extensible Markup Language

Example: <!ELEMENT Street (#PCDATA)>

element Street contains the parsed character Data

<!ELEMENT Address (Name, Street, City)> element Address contains three nested tags Name, Street and City

respectively

<!ELEMENT AddressBook (Address+)> Element AddressBook contains one or more occurrences of element

Address

Page 42: XML – Extensible Markup Language

Anatomy of DTD – Dealing with multiple children

To declare the children of an element we use syntax similar to regular expression in Perl. To define the children of an element we use the following syntax: (Assume a and b are child elements of the element being declared)

Page 43: XML – Extensible Markup Language

A+ -One or more occurrences of aA* - Zero or more occurrences of a A?-a or nothingA, B – A followed by BA|B – a or b, but not both(expression) – Surrounding an expression with

parentheses means that it is treated as a unit and may have the suffix operator ?,*or +

Page 44: XML – Extensible Markup Language

Some examples <!ELEMENT ITEM (PRODUCT,NUMBER,(PRICE|

CHARGEACCT|SAMPLE))> <!ELEMENT ITEM (PRODUCT,NUMBER,(PRICE|

CHARGEACCT*|SAMPLE)+)> <!ELEMENT ITEM (#PCDATA|PRODUCTID)*> <!ELEMENT

BOOK(OPENER,SUBTITLE?,INTRODUCTION?,(SECTION|PART)+)>

Page 45: XML – Extensible Markup Language

Anatomy of DTD – Attribute Declarations Specifies allowable attributes of each

element <!ATTLIST Tag-name Attr-Name Attr-Type Restriction> Tag-name : Element name Attr-Name : Name of the attribute, the

attribute is defined for element Tag-Name

Page 46: XML – Extensible Markup Language

Restriction: Value : Shows a simple text value enclosed in quotes #IMPLIED:Indicates that there is no default value for

this attribute, and this attribute need not be used #REQUIRED:Indicates that there is no default value for

this attribute, but that a value must be assigned to this attribute

#FIXED Value: In this case, Value is the attribute’s value, and the attribute must always have this value

Page 47: XML – Extensible Markup Language

Anatomy of DTD – Attribute Declarations Example

<!ATTLIST Name salutation CDATA #REQUIRED>

The element Name has attribute salutation which is of type CDATA

The attribute salutation must be specified in the Name tag

Page 48: XML – Extensible Markup Language

Anatomy of DTD – Entity Declarations (1 of 2) Way to escape special characters

Some special characters such as <, >, & are not used

as #PCDATA This escaping of the characters is called as “Entity

reference”

Page 49: XML – Extensible Markup Language

Following different entity references are used in the

XML document Built-in Entities : &amp;, &lt;, &gt;, &apos;, &quot; Characters Entities : &#243; representing ó

Example <State>Jammu &amp; Kashmir</State>

Page 50: XML – Extensible Markup Language

Anatomy of DTD – Entity Declarations(2 of 2) Data that is frequently used can be

declared as an General Entity <!ENTITY entity_name entity_contents>

entity_name : Name of the new Entity

entity_contents : Contents of the new entity

Page 51: XML – Extensible Markup Language

Example <!ENTITY MyCountry "India">

Defines the entity called as MyCountry “India” is the contents of entity MyCountry

Usage in the XML Document <Country>&MyCountry;</Country>

Page 52: XML – Extensible Markup Language

XML Schema

What is XML Schema? An XML vocabulary for expressing your data's structure and

business rules

Validating parsers can use Schema to check whether XML

data adheres to rules in schema

More robust and extensive than DTD, can do even data type

validations

Page 53: XML – Extensible Markup Language

E.g. : Consider following XML Document<Result><EmpNo>45609</EmpNo><Name>Kiran</Name><Subject>

<Name>IWT</Name><Marks>80</Marks><Grade>A</Grade>

</Subject></Result>

Page 54: XML – Extensible Markup Language

Is this data valid?

To be valid, it must meet following business rules (constraints)

The Result must be comprised of a Subject, Marks, Grade in

the order shown

The Subject must be any valid subject from the list (DC, IWT,

Cryptography)

The Marks must be between 0 to 100 only and Grade can be

either A or B or C

Page 55: XML – Extensible Markup Language

How can XML schema help to accomplish this?

Answer It creates XML vocabulary : Defines following set of elements

<Result>, <Subject>, <Marks>, <Grade> It specifies the contents of each element and restrictions on each

element <Result> element must contain <Subject>, <Marks>, <Grade> in that order

<Subject> must be one of the valid subjects (IWT, Cryptography, DC)

The Marks must be between 0 to 100 only Grade can be either A or B or C

Page 56: XML – Extensible Markup Language

XML Schema specifies in which namespace the created vocabulary must be in

It is not an actual URL, but uses URL syntax and should be a unique string

Example: http://www.Results.com Namespace defines the following vocabulary

Page 57: XML – Extensible Markup Language

Example of referring to Schema

<?xml version = "1.0" encoding = "UTF-8"?><res:Result xmlns:res="http://www.Results.com"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.Results.com Result.xsd">

<res:Name>Kiran</res:Name><res:EmpNo>45609</res:EmpNo><res:Subject>

<res:Name>IWT</res:Name><res:Marks>80.70</res:Marks><res:Grade>A</res:Grade>

</res:Subject><res:Subject>

<res:Name>PF</res:Name><res:Marks>78.30</res:Marks><res:Grade>B+</res:Grade>

</res:Subject></res:Result>

Result.xml

Page 58: XML – Extensible Markup Language

Schema example : Result.xsd<?xml version="1.0" encoding="UTF-8"?><xsd:schema

xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.Results.com"

xmlns="http://www.Results.com" elementFormDefault="qualified">

<!-- Root Element Declaration --><xsd:element name="Result"> <xsd:complexType> <xsd:sequence> <xsd:element name="Name"

type="xsd:string"/> <xsd:element name="EmpNo"

type="xsd:int"/> <xsd:element name="Subject"

type="SubjectType" maxOccurs="5"/> </xsd:sequence> </xsd:complexType></xsd:element> <xsd:simpleType name="NameType"> <xsd:restriction base="xsd:string"> <xsd:pattern value="CHSSC|PF|

RDBMS|IWT|AOA"/> </xsd:restriction></xsd:simpleType>

Result.xsd

Page 59: XML – Extensible Markup Language

Schema example : Result.xsd<xsd:complexType name="SubjectType"><xsd:sequence> <xsd:element name="Name" type="NameType"/> <!-- Reference to the element Marks --> <xsd:element ref="Marks"/> <xsd:element name="Grade"> <xsd:simpleType> <xsd:restriction base="xsd:string"><xsd:pattern value="A|B+|B|C|D"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence></xsd:complexType><xsd:element name="Marks"><xsd:simpleType><xsd:restriction base="xsd:float"><xsd:minInclusive value="0.0"/> <xsd:maxInclusive value="100.0"/></xsd:restriction></xsd:simpleType></xsd:element></xsd:schema>

Page 60: XML – Extensible Markup Language

DTD vs Schema XML document and DTD use different syntax : Inconsistency

Schema uses XML syntax Limited data type capability

DTDs support a very limited capability for specifying data types. DTDs do not support field level validations and complex types

E.g. : You can't, express "I want the <Marks> element to hold an integer with a range of 0 to 100“ in DTD

Schema describes a set of data types compatible with those found in databases E.g.: Database supports integer, string, etc data types Schema supports integer, string etc while the DTD does not

Page 61: XML – Extensible Markup Language

Element Declarations: Simple Element

Syntax : <xsd:element name=“Element_name” type=“Element_type” Occurrence/>

Element_name : Any valid xml name Element_type : Built in Simple type Occurrence : Number of occurrences of that element, optional

Page 62: XML – Extensible Markup Language

Example : <xsd:element name="Name" type="xsd:string"/>

Defines the element Name of type string <xsd:element name=“Marks" type=“xsd:float“ maxOccurs=“5”/>

Defines the element Marks of simple type float

Marks may appear for maximum 5 times

And by default for minimum 1 time

Page 63: XML – Extensible Markup Language

Element Declarations

Syntax : <xsd:element name=“Element_name”>

<xsd:complexType><!-- Element Specification -->

</xsd:complexType></xsd:element>

Page 64: XML – Extensible Markup Language

Example<xsd:element name=“Subject"> <xsd:complexType> <xsd:sequence> <xsd:element name=“Name" type="xsd:string"/><xsd:element name=“Marks" type="xsd:float"/> <xsd:element name=“Grade" type="xsd:string"/>

</xsd:sequence> </xsd:complexType><xsd:element> Defines non reusable complex element called ‘Subject’ Each element appears in that sequence because <xsd:sequence> tag is used

Page 65: XML – Extensible Markup Language

Element Declarations: Reusable Simple Type

Element_type_name : Name of the data type Base_data_type : Any of the built in simple data type (integer, float etc) Restriction_specification : Specifies restriction on the element if any

<xsd:simpleType name=“Element_type_name"><xsd:restriction

base="Base_Data_type"><!-- Restriction specification

--></xsd:restriction>

</xsd:simpleType>

Page 66: XML – Extensible Markup Language

Example :<xsd:simpleType name=“MarksType">

<xsd:restriction base="xsd:float"> <xsd:minInclusive value=“0.0"/><xsd:maxInclusive value=“100.0”/>

</xsd:restriction> </xsd:simpleType> Defines the reusable element type MarksType Element defined as MarksType may take minimum value of 0.0

and maximum value 100.0 <xsd:element name=“Marks” type=“MarksType”>

Page 67: XML – Extensible Markup Language

Element Declarations: Reusable Complex Type

Syntax <xsd:complexType name=“Type_name”> Defines the reusable type Type_name

Example<xsd:complexType name=“SubjectType“> <xsd:sequence> <xsd:element name=“Name" type=“xsd:string"/>

<xsd:element name=“Marks" type="xsd:int"/>

<xsd:element name=“Grade" type="xsd:string”/>

</xsd:sequence> </xsd:complexType>

Page 68: XML – Extensible Markup Language

Defines reusable complex element type SubjectType Comprises of following elements in the sequence

specified (<xsd:sequence> tag) Name Marks Grade

This type can be used to define elements in your XML<xsd:element name=“Subject” type=“SubjectType”>

Page 69: XML – Extensible Markup Language

Defining the Attributes Syntax : <xsd:attribute name=“Attr_Name" type=“Attr_Type"/> Example

<xsd:attribute name=“Project" type=“xsd:string"/>

All attributes are declared as simple types. Only complex elements can have attributes

Page 70: XML – Extensible Markup Language

Anatomy of XML Schema : Constraints specification

Controls occurrence of individual element or group of elements

Types of constraints <choice> : allows only one element to appear <sequence> : elements must appear in the same

order as they are declared <all> : elements can occur in any

order and in any combination

Page 71: XML – Extensible Markup Language

<choice> constraint E.g.:

<xsd:choice><xsd:element name=“first”/><xsd:element name=“last”/>

</xsd:choice> Allows either first or last name to be used in the

instance XML Document

Page 72: XML – Extensible Markup Language

<sequence> constraints E.g.:

<xsd:sequence> <xsd:element name="Name" type="xsd:string"/>

<xsd:element name="EmpNo" type=“xsd:int"/> <xsd:element name=“Subject" type="SubjectType" maxOccurs="5"/>

</xsd:sequence> All elements must appear in the defined order only

Page 73: XML – Extensible Markup Language

Anatomy of XML Schema : Constraints specification <all> constraints

E.g. : <xsd:all>

<xsd:element name=“invoice”><xsd:element name=“purchaseOrder”><xsd:element name=“mailingLabel”>

</xsd:all> Any of the elements can either appear or not appear Elements may appear in any order

Page 74: XML – Extensible Markup Language

XML Parsers

Page 75: XML – Extensible Markup Language

XML Parser : The Big Picture

Usage of the XML Parser

XML

Document

XML

Parser

Client

Application

API’s

Parsed Data

XML

DTD / Schema

Page 76: XML – Extensible Markup Language

Why to use Parser? Typically use a pre-built XML parser (e.g. JAXP,

Apache Xerces etc) This enables you to build your application much

more quickly

Page 77: XML – Extensible Markup Language

Need for Parser Defining the Parser’s Responsibilities

Ensure that the document adheres to specific standards Does the document match the DTD or Schema? Is the document well-formed?

Make the document contents available to your application

The parser will parse the XML document, and make this data available to your application

An application using parser can access data in XML by going through the hierarchy or using tag names

Page 78: XML – Extensible Markup Language

Types of XML Parsers Validating Parser

a parser that verifies that the XML document adheres to the DTD or Schema

Non-Validating Parser a parser that does not verify the XML document

against the DTD or Schema Most parsers provide an option to turn validation on or off All parsers checks the well-formedness of XML document

at all times

Page 79: XML – Extensible Markup Language

XML Parser Interfaces Two types of Interfaces provided by XML Parsers

SAX An Event Based Interface DOM a Tree Based Interface

JAXP “Java API for XML Processing” JAXP is part of JDK Provides parsers which can be used in any Java application

It supports both Tree Based Parser : DOM Event Based Parser : SAX

Page 80: XML – Extensible Markup Language

DOM Parser Tree Based Parser

Definition: Parser reads the XML document, and creates an in-memory “tree” representation of XML Document

For example: Given a sample XML document below

What kind of tree would be produced?

Page 81: XML – Extensible Markup Language

<Result><Name>Kiran</Name><EmpNo>45609</EmpNo><Subject>

<Name>CHSSC</Name> <Marks>80</Marks> <Grade>A</Grade>

</Subject></Result>

Page 82: XML – Extensible Markup Language

In memory tree created by Tree Based Parser Tree represents the hierarchy of XML document

Page 83: XML – Extensible Markup Language

DOM Parser

Result

Name

EmpNo

Kiran

45609

Text Nodes

Element Nodes

Page 84: XML – Extensible Markup Language

DOM Parser Tree based APIs presents a memory model of entire

document to an application once parsing has concluded No need to use extra data-structures to maintain the

information during parsing An application can navigate through the tree to find the

desired pieces of document Document Object Model (DOM) is the standard for

Tree Based parsing of XML document

Page 85: XML – Extensible Markup Language

Document Object Model (DOM) The Document Object Model (DOM) is a set of

interfaces defined by the W3C DOM Working Group DOM is the tree based interface used by the

programmers to manipulate the XML document DOM Parser can be Validating or Non Validating DOM Parser represents the logical Model of the XML

document in the memory All the entity reference are expanded before the DOM

tree was constructed

Page 86: XML – Extensible Markup Language

DOM Structure representing XML

Document

Element Element

Attribute

Element

Text

Comment

Result

Name

SubjectKiran

EmpNo

IWT

Text

45609

XML Document Structure

Document Structure representing Result.xml

Name

Grade

Marks

80.0

A

Document Root

Element Node

Text Node

Page 87: XML – Extensible Markup Language

Document Object Model (DOM) : Overview

The root of the DOM Hierarchy is called as a Document node Example : Result

The Child nodes of the Document node are : Element nodes, Comments nodes etc Example : Name, Subject, EmpNo, etc are all Child

Nodes All the nodes in the XML Document are derived from

interface :org.w3c.dom.Node

Page 88: XML – Extensible Markup Language

The Big picture : Parsing the XML Document Document builder factory creates an instance of parser with required characteristics

Whether the parser should be validating parser or not

Whether namespace support required or not, Whether to ignore the white spaces between the elements or

not

Factory hides the implementation details of the parser and gives a standard DOM interface for parsing

XML (Analogous to JDBC driver)

Page 89: XML – Extensible Markup Language

XMLData

DocumentBuilder

(Parser)

DocumentBuilderFactory

Document Object (DOM)

Object

Object Object

Object Object

Page 90: XML – Extensible Markup Language

DomApp.java : Parsing XML Document using DOM Parserpublic class DomApp { public static void main(String argv[]) { MyErrorHandler hErr;

Document hDocument; DocumentBuilderFactory factory =

DocumentBuilderFactory.newInstance(); factory.setValidating(true); factory.setNamespaceAware(true);

Page 91: XML – Extensible Markup Language

try {hErr = new MyErrorHandler();

DocumentBuilder hBuilder = factory.newDocumentBuilder();

// Set the error handlerhBuilder.setErrorHandler(hErr);

hDocument = hBuilder.parse( new File(“Result.xml”));

} catch (Exception e){// Handle exception if generated during parsing

} }// End of Function main}

Page 92: XML – Extensible Markup Language

Parsing the XML Document using DOM Parser

Step 1: Get the instance of document-builder factory.This will be used to produce the DOM-parser (called DocumentBuilder)

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

Step 2: Set the properties of the DOM parser to be produced a. It should validate the XML Document against the Schema / DTDb. It should be namespace aware

factory.setValidating(true); factory.setNamespaceAware(true);Step 3 : Obtain the instance of the MyErrorHandler classThis instance handles the error generated during parsing, in application

specific way hErr = new MyErrorHandler();

Page 93: XML – Extensible Markup Language

Step 4: Obtain the instance of DOM parser, and register the error handler

This will be used to parse the XML Document and creates the memory based tree representation of the XML DocumentDocumentBuilder hBuilder=factory.newDocumentBuilder();

hBuilder.setErrorHandler(hErr);Step 5 : Parse the XML Document (Result.xml) using the

parser created as above hDocument = hBuilder.parse( new File(“Result.xml”));

Page 94: XML – Extensible Markup Language

The Node interface is the root of DOM Core class hierarchy

This interface can be used to extract information from any DOM

object without knowing its actual type (e.g. Element node, Text node,

Attr Node etc ) of underlying node

i.e. It is possible to access a document's complete structure and

content using only the methods and properties exposed by the Node

interface

The Class Hierarchy rooted at org.w3c.dom.Node

Page 95: XML – Extensible Markup Language

DOM : Exploring the org.w3c.dom.Node Interface

Node

Element Document

Attr Text Comment

Entity

Page 96: XML – Extensible Markup Language

DOM : Important Methods of Node interface Methods to retrieve the various information from the XML

DOM Tree Node getFirstChild(): Returns the first child of the current

node Node getLastChild(): Returns the last child of the current

node String getNodeName(): The name of this node String getNodeValue(): The value of this node, depending

on its type short getNodeType(): A code representing the type of the

underlying object

Page 97: XML – Extensible Markup Language

Methods to alter the elements of XML DOM Tree

Node insetBefore( Node newChild, Node refChild) Node appendChild (Node newChild) Node removeChild (Node oldChild) Node replaceChild (Node newChild, Node

oldChild )

Page 98: XML – Extensible Markup Language

Using Node InterfaceReslt

Name

SubjectKiran EmpNo

Name45609

Node hLastChild = hNode.getLastChild();

hFirstChild= hFirstChild.getFirstChild();

String sName = hFirstChild.getNodeName()

String sVal = hFirstChild.getNodeValue()

hNode = hDocument.getDocumentElement()

Node hFirstChild= hNode.getFirstChild();

Page 99: XML – Extensible Markup Language

XML Parser Interfaces : Event Based Interface Event Based Interface

Definition : Parser reads the XML document and generates events for each parsing step

Some common parsing events Element start-tag read Element content read Element end- tag read

Page 100: XML – Extensible Markup Language

Example<Result>

<Name>Kiran</Name> <EmpNo>45609</EmpNo> <Subject>

<Name>CHSSC</Name> <Marks>80</Marks> <Grade>A</Grade>

</Subject></Result>

Page 101: XML – Extensible Markup Language

XML Parser Interfaces : Event Generated

startElement : Result startElement : Name contents : Kiran endElement : Name startElement : EmpNo contents : 45609 endElement : EmpNo endElement : Result

Page 102: XML – Extensible Markup Language

XML Parser Interfaces : Event Based Interface For each of these events, your application implements “event

handlers” Each time an event occurs, a different event handler is called Your application intercepts these events, and handles them in any

way you want Application does not wait till the entire document gets parsed Application has to maintain the information from XML document

within local data-structures till it is processed completely Simple API for XML (SAX) is the standard for Event Based parsing

of XML document

Page 103: XML – Extensible Markup Language

SAXApp.java : Parsing XML Document using SAX Parser

public class SAXApp {public static void main(String argv[]) {

//Get the instance of parser event handing class

DefaultHandler handler = new Handler();//Get the instance of SAXParserFactorySAXParserFactory factory =

SAXParserFactory.newInstance();try {

// Set the properties of the parser to be obtained

factory.setValidating(true); factory.setNamespaceAware(true);

Page 104: XML – Extensible Markup Language

// Get the new SAX ParserSAXParser saxParser = factory.newSAXParser();// Parse the file// handler : processes events generated during parsingsaxParser.parse(new File(“Result.xml”), handler);}//Handle any exceptions if generated during parsingcatch (Throwable t) { t.printStackTrace(); }} // End of function main

}

Page 105: XML – Extensible Markup Language

SAXApp.java : Parsing XML Document using SAX Parserclass Handler extends DefaultHandler{

public void error(SAXParseException e) throws SAXException {System.out.println("Error At Line:”+e.getLineNumber()); System.out.print(“Column: "+e.getColumnNumber());// Print the error messageSystem.out.print(e.getMessage());}

// Process any fatal errors in the XML documentpublic void fatalError(SAXParseException e) throws SAXException {System.out.println("Fatal Error At Line:”+e.getLineNumber()); System.out.print(“Column: "+e.getColumnNumber());// Print the error messageSystem.out.print(e.getMessage());}

} //End Class DefaultHander

Page 106: XML – Extensible Markup Language

Understanding The Simple API for XML (SAX) Step 1: Get the instance of SAXParserFactory

This instance is used to obtain the SAX ParserSAXParserFactory factory = SAXParserFactory.newInstance();

Step 2:Get the instance of the event handler classThis class handles all the events generated by parser DefaultHandler handler = new Handler();

Step 3:Set the properties of the parser to be obtaineda. It should validate the XML Document against the Schema / DTDb. It should be namespace aware

factory.setValidating(true); factory.setNamespaceAware(true);Step 4 : Obtain the instance of the SAX Parser using the factory just obtainedSAXParser saxParser = factory.newSAXParser();

Step 5: Parse the Result.xml file using the SAX Parser obtained as aboveEvents generated during parsing will be handled by object handler

saxParser.parse(new File(“Result.xml”), handler);

Page 107: XML – Extensible Markup Language

The Big picture : Paring the XML Document using SAX

XML

Document SAX Parser

SAX Parser

Factory

DefaultHandler/ MyHandler

org.xml.saxContentHander

org.xml.saxErrorHander

org.xml.saxEntityResolver

Parser Events

org.xml.sax class hierarchy

implements

Page 108: XML – Extensible Markup Language

org.xml.sax Interfaces org.xml.sax.DefaultHandler Class

Provides the default implementation of all the events

DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods).

Only the methods which are required are overridden

Page 109: XML – Extensible Markup Language

org.xml.sax.ContentHandler Interface Receive notification of the logical content of a document Defines methods like startDocument(), endDocument(),

startElement(), and endElement() These are invoked when an XML tags arerecognized Also defines methods characters() which are invoked

when the parser encounters the text in an XML element

Page 110: XML – Extensible Markup Language

org.xml.sax Interfaces org.xml.sax.ErrorHandler Interface

Allows SAX application to do customized error handling

The parser will then report all errors and warnings through this interface

Page 111: XML – Extensible Markup Language

Important Methods void error() : receives the notification of

recoverable error void fatalError(): receives the notification of non-

recoverable error void warning(): receives the notification of a

warning

Page 112: XML – Extensible Markup Language

Evaluating Parsers : SAX vs. DOM SAX

Advantage

It is good when serial processing of the document is required

and document is very large

i.e. when the size of the XML document is in terms of GBs.

Disadvantage

Requires internal data structure to maintain the parts of XML

document till the complete processing is not finished, therefore

not suitable for parsing the small XML Documents.

Page 113: XML – Extensible Markup Language

DOM Advantage

Supports DOM Tree Traversing methods Allows modification of XML Document Good when the random access of a document is

required Disadvantage

For large XML documents (size in GBs) requires more memory as compared to memory required to parse XML document using SAX Parser.