applied xml programming for microsoft 3

44
Applied XML Programming for Microsoft .NET PART 3

Upload: raghu-nath

Post on 15-Jul-2015

54 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Applied xml programming for microsoft 3

Applied XML Programming for Microsoft .NETPART 3

Page 2: Applied xml programming for microsoft 3

XML Data Validation1. The correctness of XML documents can be measured using two distinct and complementary

metrics: the well-formedness of the document and the validity.

1. Well-formedness of the document refers to the overall syntax of the document. Validation applies at a deeper level and involves the semantics of the document, which must be compliant with a userdefined layout

Page 3: Applied xml programming for microsoft 3

The XmlTextReader class ensures only that the document being processed is

syntactically correct. By design, the XmlTextReader class deliberately avoids making a

more advanced analysis of the nodes in the document and checking their internal

dependencies. A more specialized class is available in the Microsoft .NET Framework

for accomplishing this more complex task—the XmlValidatingReader class. This

chapter will focus on techniques and classes available in the .NET Framework to

perform validation on XML data.

Page 4: Applied xml programming for microsoft 3

Although validation is a key aspect in projects that involve critical document exchange

across heterogeneous platforms, it does come at a price. Validating a document means

taking a while to analyze the constituent nodes; the number, type, and values of their

attributes; and the node-to-node dependencies. When applications handle a fully

validated document, they can be certain not only about the overall syntax but even

about the contents. In a normal XML document, a node simply represents itself—a

rather generic repository of hierarchical information. In a validated XML document, on

the other hand, the same node to the application's eye represents a strongly typed and

strongly defined piece of information. Basically, in a validated document, a node

<invoice_number> ceases to be a node and becomes what it was intended to be—the

number of the invoice.

Page 5: Applied xml programming for microsoft 3

Clearly, a nonvalidating reader (and, more generally, a nonvalidating XML parser) will

run faster than a validating reader, and that's why XML parsers usually provide XML

validation as an option that can be programmatically toggled on and off. In .NET

applications, you use XmlTextReader if you simply need well-formedness; you resort to

XmlValidatingReader if you need to validate the schema of the document.

Page 6: Applied xml programming for microsoft 3

The XmlValidatingReader ClassThe XmlValidatingReader class is an implementation of the XmlReader class that

provides support for several types of XML validation: document type definitions (DTDs),

XML-Data Reduced (XDR) schemas, and XML Schemas. The XML Schema language

is also referred to as XML Schema Definition (XSD). DTD and XSD are official

recommendations issued by the W3C, whereas XDR is simply the Microsoft

implementation of an early working draft of XML Schemas that will be superseded by

XSD as time goes by

Page 7: Applied xml programming for microsoft 3

You can use the XmlValidatingReader class to validate entire XML documents as well

as XML fragments. An XML fragment is a string of XML code that does not have a root

node. For example, the following XML string turns out to be a valid XML fragment but

not a valid XML document. XML documents must have a root node.

Page 8: Applied xml programming for microsoft 3

<firstname>Dino</firstname>

<lastname>Esposito</lastname>

The XmlValidatingReader class works on top of an XML reader—typically an instance

of the XmlTextReader class. The text reader is used to walk through the nodes of the

document, and then the validating reader gets into the game, validating each piece of

XML based on the requested validation type.

Page 9: Applied xml programming for microsoft 3

Supported Validation TypesWhat are the key differences between the validation mechanisms (DTD, XDR, and

XSD) supported by the XmlValidatingReader class?

DTD A DTD is a text file whose syntax stems directly from the Standard Generalized Markup Language (SGML)—the ancestor of XML as we know it today.

XDR XDR is a schema language based on a proposal submitted by Microsoft to the W3C back in

1998. (For more information, see http://www.w3.org/TR/1998/NOTE-XML-data-0105.) XDRs are

flexible and overcome some of the limitations of DTDs.

Page 10: Applied xml programming for microsoft 3

XSD XSD defines the elements and attributes that form an XML

document. Each element is strongly typed. Based on a W3C

recommendation, XSD describes the structure of XML documents using

another XML document

Page 11: Applied xml programming for microsoft 3

DTD was considered the cross-platform standard until a couple of years ago. Then the

W3C officialized a newer standard—XSD—which is, technically speaking, far superior

to DTD.

Page 12: Applied xml programming for microsoft 3

The XmlValidatingReader Programming InterfaceThe XmlValidatingReader class inherits from the base class XmlReader but implements

internally only a small set of all the functionalities that an XML reader exposes. The

class always works on top of an existing XML reader, and many methods and

properties are simply mirrored

Page 13: Applied xml programming for microsoft 3

The dependency of validating readers on an existing text reader is particularly evident if

you look at the class constructors. An XML validating reader, in fact, can't be directly

initialized from a file or a URL. The list of available constructors comprises the following

overloads:

public XmlValidatingReader(XmlReader);

public XmlValidatingReader(Stream, XmlNodeType,

XmlParserContext);

public XmlValidatingReader(string, XmlNodeType,

XmlParserContext);

Page 14: Applied xml programming for microsoft 3
Page 15: Applied xml programming for microsoft 3

Different Treatments for XSD and XDRAlthough you can store both XSD and XDR schemas in the schema collection, there

are some differences in the way in which the XmlSchemaCollection object handles

them internally. For example, the Add method returns an XmlSchema object if you add

an XSD schema but returns null if the added schema is an XDR. In general, any

method or property that manipulates the input or output of an XmlSchema object

supports XSD schemas only.

Page 16: Applied xml programming for microsoft 3

Another difference concerns the behavior of the Item property in the

XmlSchemaCollection class. The Item property takes a string representing the

schema's namespace URI and returns the corresponding XmlSchema object. This

happens only for XSDs, however. If you call the Item property on a namespace URI that

corresponds to an XDR schema, null is returned.

Page 17: Applied xml programming for microsoft 3

The reason behind the different treatments for XDR and XSD schemas is that XDR

schemas have no object model available in the .NET Framework, so when you need to

handle them through objects, the system gracefully ignores the requests.

Page 18: Applied xml programming for microsoft 3

XDR schemas are there only to preserve backward compatibility; you will not find them

supported outside the Microsoft Win32 platform. It is important to pay attention to the

methods and the properties you use to manage XDR in your code. The overall

programming interface makes the effort to unify the methods and the properties to work

on both XDRs and XSDs. But in some circumstances, those same methods and

properties might lead to unpleasant surprises.

In a nutshell, you can cache an XDR schema for further and repeated use by the

XmlValidatingReader class, but that's all that you can do. You can't check for the

existence of XDR schemas, nor can a reference to an XDR schema be returned. But

you can do this, and more, for XSDs.

Page 19: Applied xml programming for microsoft 3

Validating XML Fragmentsthe XmlValidatingReader class has the ability to parse and validate entire documents as well as XML fragments

Page 20: Applied xml programming for microsoft 3

Using DTDsThe DTD validation guarantees that the source document complies with the validity

constraints defined in a separate file—the DTD. A DTD file uses a formal grammar to

describe both the structure and the syntax of XML documents. XML authors use DTDs

to narrow the set of tags and attributes allowed in their documents. Validating against a

DTD ensures that processed documents conform to the specified structure. From a

language perspective, a DTD defines a newer and stricter XML-based syntax and a

new tagged language tailor-made for a related group of documents.

Page 21: Applied xml programming for microsoft 3

Developing a DTD GrammarLet's look more closely at a DTD file. To build a DTD, you normally start writing the file

according to its syntax. In this case, however, we'll start from an XML file named

data_dtd.xml that will actually be validated through the DTD, as shown here:

Page 22: Applied xml programming for microsoft 3

<?xml version="1.0" ?>

<!DOCTYPE class SYSTEM "class.dtd">

<!-- Sample XML document (data_dtd.xml) using a DTD -->

<class title="Applied XML Programming for .NET"

company="DinoEsposito's Own Company"

author="Dino Esposito">

Page 23: Applied xml programming for microsoft 3

<days total="5" expandable="true">

<day id="1">XML Core Classes</day>

<day id="2">Related Technologies</day>

<day id="3">XML and ADO.NET</day>

<day id="4" optional="true">XML and Applications</day>

<day id="5" optional="true">XML Interoperability</day>

</days>

</class>

Page 24: Applied xml programming for microsoft 3

general information about the class (title, author, training company) are written using

attributes. Each module spans a full day, and its description is implemented using plain

text.

Any XML document that must be validated against a given DTD file includes a

DOCTYPE tag through which it simply links to the DTD of choice, as shown here:

<!DOCTYPE class SYSTEM "class.dtd">

Page 25: Applied xml programming for microsoft 3

The following listing demonstrates a DTD that is tailor-made for the preceding XMLdocument:

<!ELEMENT class (days)>

<!ATTLIST class title CDATA #REQUIRED

author CDATA #IMPLIED

company CDATA #IMPLIED>

<!ENTITY % Boolean "true | false">

<!ELEMENT days (day*)>

<!ATTLIST days total CDATA #REQUIRED

expandable (%Boolean;) #REQUIRED>

<!ELEMENT day (#PCDATA)>

<!ATTLIST day id CDATA #REQUIRED

optional (%Boolean;) #IMPLIED>

Page 26: Applied xml programming for microsoft 3

Certainly XSDs provide you with more functions than DTDs can. For one thing,

schemas are all written in XML and don't require you to learn a new language. If you

look at our basic DTD example in this context, you might not be scared by its unusual

format. As you move from textbook examples and enter the tough real world, the

complexity of an inflexible language like DTD becomes more apparent.

Page 27: Applied xml programming for microsoft 3

XSDs provide you with a finer level of control over the cardinality of the tags and the

attribute types. In addition, XSDs can be used to set up a system of schema inheritance

in which more complex types are built atop existing ones.

Page 28: Applied xml programming for microsoft 3

Using XDR SchemasAs mentioned, XML-Data Reduced (XDR) schema validation is the result of a Microsoft

implementation of an early draft of what today is XSDs. XDR was implemented for the

first time in the version of MSXML that shipped with Microsoft Internet Explorer 5.0,

back in the spring of 1999.

In the XDR schema specification, you'll find almost all of the ideas that characterize

XSDs today. The main reason for XDR support in the .NET Framework is backward

compatibility with existing MSXML-based applications. To enable these applications to

upgrade properly to the .NET Framework, XDR support has been retained intact. You

will not find XDR support anywhere else outside the Microsoft Windows platform,

however.

Page 29: Applied xml programming for microsoft 3

If you have used Microsoft ActiveX Data Objects (ADO), and in particular the library's

ability to persist the contents of a Recordset object to XML, you are probably a veteran

of XDR. In fact, the XML schema used to persist ADO 2.x Recordset objects to XML is

simply XDR.

Page 30: Applied xml programming for microsoft 3

What Is a SchemaA schema is an XML file (with typical extension .xsd) that describes the syntax and

semantics of XML documents using a standard XML syntax. An XML schema specifies

the content constraints and the vocabulary that compliant documents must

accommodate. For example, compliant documents must fulfill any dependencies

between nodes, assign attributes the correct type, and give child nodes the exact

cardinality.

Page 31: Applied xml programming for microsoft 3

The XML Schema specification is articulated into two distinct parts. Part I contains the

definition of a grammar for complex types—that is, composite XML elements. Part II

describes a set of primitive types—the XML type system—plus a grammar for creating

new primitive types, said to be simple types. New types are defined in terms of existing

types.

Page 32: Applied xml programming for microsoft 3

An XML schema also supports rather advanced and object-oriented concepts such as

type inheritance. In the .NET Framework, the SOM provides a suite of classes held in

the System.Xml.Schema namespace to read a schema from an XSD file. These

classes also enable you to programmatically create a schema that can be either

compiled in memory or written to a disk file.

Page 33: Applied xml programming for microsoft 3

Simple and Complex Types

XML simple types consist of plain text and don't contain any other elements. Examples

of simple types are string, date, and various flavors of numbers (long, double, and

integer). XML complex types can include child elements and attributes. In practice, a

complex type is always rendered as an XML subtree. A complex type can be

associated only with an XML element node, whereas a simple type applies to both

elements and attributes.

Page 34: Applied xml programming for microsoft 3

structure of the XSD type system

Page 35: Applied xml programming for microsoft 3

Defining an XSD SchemaYou have three options when creating an XSD schema. You can write it manually by

combining the various tags defined by the XML Schema specification. A more effective

option is represented by Visual Studio .NET, which provides a visual editor for XSD files

with full IntelliSense support. The third option is based on the XML Schema definition

tool (xsd.exe) mentioned in the previous section, which can infer the underlying schema

from any well-formed XML document.

Page 36: Applied xml programming for microsoft 3

Setting Up a Sample SchemaLet's start by creating a simple schema to describe an address. Like many realworld

objects, an address too is rendered using a complex type—a kind of XML data

structure. The following code shows the schema for an address. It's a fairly simple

schema consisting of a sequence of five elements: street, number, city, state and zip,

plus an attribute named country

Page 37: Applied xml programming for microsoft 3

<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="address" type="AddressType" />

<xs:complexType name="AddressType">

<xs:sequence>

<xs:element name="street" type="xs:string" />

<xs:element name="number" type="xs:string" />

<xs:element name="city" type="xs:string" />

Page 38: Applied xml programming for microsoft 3

<xs:element name="state" type="xs:string" />

<xs:element name="zip" type="xs:string" />

</xs:sequence>

<xs:attribute name="country" type="xs:string" />

</xs:complexType>

</xs:schema>

Page 39: Applied xml programming for microsoft 3

Linking Documents and SchemasYou might want to know how an XML document can link to the schema. An XML

schema can be associated with document files in two ways: as in-line code or through

external references. The second option decouples the document instance and the

schema. The first option, on the other hand, simplifies deployment and data

transportation because all information resides in a single place.

Page 40: Applied xml programming for microsoft 3

XML validation is the parser's ability to verify that a given XML source document is

comformant to a specified layout. The intrinsic importance of validation, and related

technologies, can't be denied, but a few considerations must be kept in mind.

Page 41: Applied xml programming for microsoft 3

For one thing, XML documents and schema information must be distinct elements. This

improves performance when the document is transferred over the wire and keeps the

memory footprint as lean as possible. In addition, validating a document to make sure it

has the requested layout is not always necessary if the correctness of the data two

applications exchange can be ensured by design. If the documents sent and received

are generated programmatically and there is no (reasonable) way to hack them,

validation can be an unneeded burden. In this case, you can rate the schema

information as similar to debug information in Win32 executables: useful to speed up

the development cycle, but useless in a production environment.

Page 42: Applied xml programming for microsoft 3

The real big thing behind XML validation is XSD—a W3C specification to define the

structure, contents, and semantics of XML documents. XSD is another key element that

enriches the collection of official and de facto current standards for interoperable

software. It joins the group formed by HTTP for network transportation, XML for data

description, SOAP for method invocation, XSL for data transformation, and XPath for

queries.

Page 43: Applied xml programming for microsoft 3

With XSD, we have a standard but extremely rigorous way to describe the layout of the

document that leaves nothing to the user's imagination. XSD is the constituent

grammar for the XML type system, and thanks to the broad acceptance gained by XML,

it is a candidate to become a universal and cross-platform type system

Page 44: Applied xml programming for microsoft 3

Further ReadingXML sprang to life in the late 1990s as a metalanguage scientifically designed to definitively push aside SGML. If you want to learn more

about this ancestor of XML, still in use in some legacy e-commerce applications, have a look at the tutorial available at

http://www.w3.org/TR/WD-html40-970708/intro/sgmltut.html.

In this chapter and in this book, you won't find detailed references to the syntax and structure of XML technologies. If you need to know all about DTD attributes and XSD components, you'll need to look elsewhere. One resource that I've found extremely valuable is Essential XML Quick Reference, written by Aaron Skonnard and Martin

Gudgin (Addison Wesley, 2001). This book is an annotated review of all the markup code around XML, including XSD, XSL, XPath, and SOAP—not coincidentally, the same XML standards fully supported by the .NET Framework. Another resource I would recommend is XML Pocket Consultant, written by William R. Stanek (Microsoft Press,

2002). For online resources, check out in particular http://www.xml.com.