validating xml data with an xml schema · 1 validating xml data with an xml schema date: may 2007...

51
1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

Upload: vuongdieu

Post on 04-Jun-2018

274 views

Category:

Documents


0 download

TRANSCRIPT

1

Validating XML Datawith an XML Schema

Date: May 2007Version: DRAFT 0.2

2

Contents1. XML Validation Concepts

a. Conceptsb. Errorsc. Resources

2. Example: Validation with XMLSpya. Downloading Spyb. Creating a new XMLSpy Projectc. Associate the homestead XML Schema with a folderd. Open the file in XMLSpye. Add the active file to the folderf. Click the "Validate" button

3. Example: Manipulating Large XML Data Sets with Ant & Eclipsea. Tools for Records and Metadata vs. Tools for Datab. Apache Ant – DOS command linec. Eclipse – GUI interfaced. V – The File Viewer – Viewing large filese. XML databases

3

Disclaimer• The information and examples in this document are for

demonstration purposes only.

• The information and examples presented are for your information to assist in enhancing the abilities of counties to work with and validate XML datasets with Minnesota Revenue XML schemas.

• The Minnesota Department of Revenue does not endorse nor support any products mentioned in this presentation. It is beyond the scope of the mission of the Property Tax Division to support tools within each county.

• Your staff is responsible for assuring that your tools match you business requirements.

4

XML Validation Concepts

<XML File/> Validation errors

Validates<XML Schema/>

XML Validator

If you have: 1) A valid XML file. And 2) a well defined XML Schema, you can 3) check the XML file to see if it is XML and has all the

required tags defined by the schema with any standard XML validation program.

This is called validation.

5

XML Validation Concepts• XML is a text file where well defined tags surround

each data value.

• An XML Schema describes what tags are needed and where they need to be for a particular file.

Tag example: <Zip_Code>55101</Zip_Code>

<xs:element name="Zip_Code"><xs:simpleType>

<xs:restriction base="xs:string"><xs:pattern value=“[0-9]{5}"/>

</xs:restriction></xs:simpleType>

</xs:element>

This fragment from an XML Schema defines a tag for Zip_Code

6

XML Validation Errors

If you have:

1) An invalid XML file: You get an invalid XML, malformed XML or content error. Examples are missing tag brackets or other syntaxerrors.

2) A valid XML file with tag errors: You get a reasonable list of XML tag errors found that are inconsistent with the specific XML Schema being validated against.

<XML File/> Validation errors

Validates<XML Schema/>

XML Validator

Tag example: <Zip_Code>55101</Zip_Code>

7

double quote

single quote or apostrophe

ampersand

greater than

less than

Name

&quot;"

&apos;'

&amp;&

&gt;>

&lt;<EscapeCharacter

There are five characters are used in XML syntax that cannot be used directly in a data value. They must be “escaped” by representing the character using the ampersand representation

XML Validation Errors for XML Escape Characters

8

10 Common XML Transmission Errors1. Mal-formed XML2. Missing namespace declarations3. Invalid document structure4. Missing required element5. Missing data in element6. Invalid document type code values7. Invalid property type code value8. Invalid character values9. Incorrect number of repeating fields10. Incorrect tax year

For more information about XML Errors, please also refer to the document: XML and XML Errors

9

XML & Validation Resources• W3C XML Standards Page – http://www.w3.org/XML/

• OASIS XML Cover Pages –http://xml.coverpages.org/xml.html#xmlValResources (lots of references)

• XML.com – http://www.xml.com (up-to-date XML information)

• XML.com Schema Tools –http://www.xml.com/pub/a/2000/12/13/schematools.html (older list of schema tools)

• XMLSpy – http://www.Altova.com (free 30 day eval xml tools and validation)

• XMLStar – http://xmlstar.sourceforge.net (free tools and validation)

10

Example:

Validating a Homestead File with XMLSpy

11

Validating with XMLSpy Steps1. Download XML Spy (30 day free eval)

and homestead zip file2. Create a new XML Spy Project3. Associate the homestead XML Schema

with a folder4. Open the file in XMLSpy5. Add the active file to the folder6. Click the "Validate" button

12

Download XML Spy• http://www.altova.com/products/xmlspy/xml_editor.html

Altova will e-mail you a 30 day license key

13

Download Homestead Files

hlaroche
Typewritten Text
hlaroche
Typewritten Text
http://taxes.state.mn.us/taxes/property_tax_administrators/other_supporting_content/homestead-ptr-2009a.zip
hlaroche
Typewritten Text
hlaroche
Typewritten Text

14

Start XML Spy

• Double click the XML Spy icon

• Create a New Project

15

New Project Window

• Note: if the window is not visible use the Window/Project menu to show the project window

16

Set the Properties of the XML Folder• Right click over the XML

files folder in the project view

• NOTE: RIGHT CLICK not left click

17

Folder Properties

• Click the "Validate with:" check box

18

Browse… to homestead schema

• Click OK and then double click on yourxml data file to be validated

19

Add this file to your project• RIGHT click and select the

"Add Active File"

20

• Click the green check

21

View Results in Validation View

• If your file is valid a green check will appear in the validation view

• Error message will appear in this same window

22

File Size Limitations• XMLSpy tends to have problems validating

files over about 25MB on a system with 1GB of RAM

• Use Apache Ant and/or Eclipse if you want to validate larger files

23

Example:

Manipulating LargeXML Data Sets with Ant & Eclipse

Tips for XML Files Above 25MB

24

Agenda• Tools for Records and Metadata vs. Tools

for Data• Apache Ant

– DOS command line• Eclipse

– GUI interface• V – The File Viewer

– Viewing large files• XML databases

25

Records vs. Databases• XML File Viewers (like XML Spy) are ideal

for viewing single records and metadata (XML Schemas)

• Visual editing tools tend stop working when file sizes exceed about 25MB (given 2GB of RAM) (e.g. We don't use MS-Word to edit 100,000 records in a database)

• Other tools are more appropriate for debugging large data sets

26

In Memory vs. Streaming • There are several different approaches to

checking large files– Load the entire file into memory (DOM)– Stream the file through memory (SAX)– Page only relevant sections into memory

(Chunking – used in V-The-File-Viewer)

27

Apache Ant• Open source build manager• User give ant a high-level description of a task• Ant executes task using dependency analysis

(only validate after extract)• Called from shell (DOS or UNIX)• Called from Integrated Development

Environment (IDE)

See Wikipedia "Apache Ant"

http://www.uniontransit.com/apache/ant/binaries/apache-ant-1.7.0-bin.zipDownload Link

28

29

Download .zip file

30

Adding tools.jar• Apache ant needs one missing jar file call

"tools.jar" that is free with Sun's Software Development Tools

• It is freely available from the Java download as part of the JavaSDK 1.4+ (but not the JDK)

• Temporary file is on the Java Open Source User Group JOSUG web site (www.josug.org/tools.jar)

• File is about 6MB!• This must be in your build "Classpath"

31

Apache Ant 1.7• Many new features• Simple <schemavalidate> target• Faster execution

<schemavalidatenoNamespaceFile="homestead-data_v0.28.xsd"file="my-homestead-data.xml">

</schemavalidate>

path to your xml schema

path to your xml data

32

<?xml version="1.0" encoding="UTF-8"?><project default="validate-homestead">

<property name="SrcDir" value="C:/homestead/stress-test"/><property name="SchemaDir" value="C:/homestead/schemas"/>

<target name="validate-homestead"><schemavalidate

noNamespaceFile="${SchemaDir}/homestead-data_v0.28.xsd"file="${SrcDir}/100MB-test.xml">

</schemavalidate></target>

</project>

Ant From DOS Command Line

1. Download Apache Ant version 1.7.02. Copy the build.xml into a directly3. Change file locations in properties of the build file to match your local files4. Run ant.bat (using the full path name) in folder that build file is located in

Change theseto match yourlocal system

build.xml

33

Apache Ant Tasks• schemavalidate

– New Ant 1.7 optional task just for XML Schema• xmlvalidate

– very general Ant 1.6 task for validation of XML files– check for well-formed files– check for validation against an XML Schema

• xslt – transforms XML files• replace

– replace specific text in large files

34

schemavalidate options

http://ant.apache.org/manual/OptionalTasks/xmlvalidate.htmlhttp://ant.apache.org/manual/OptionalTasks/schemavalidate.html

35

Example <schemavalidate> task

• 100MB file validates in 10 seconds

36

Sample Ant 1.6 Validate Script

• This will validate only the 100MB-test.xml file• Replace this with *.xml and all XML files in the source directory will be validated

37

Eclipse

• OpenSource Integrated development environment originally sponsored by IBM

• "GUI" front end to Apache Ant

See http://www.eclipse.org/

38

Sample Ant Classpath

39

Complete Ant 1.7 Build File<?xml version="1.0" encoding="UTF-8"?><project default="validate-homestead">

<property name="DataDir" value="C:/homestead/data-files"/><property name="SchemaDir" value="C:/homestead/schemas"/>

<target name="validate-homestead"><schemavalidate

noNamespaceFile="${SchemaDir}/homestead-data_v0.28.xsd"file="${DataDir}/my-data-file.xml">

</schemavalidate></target>

</project>

Properties can be set once in the file and reference many times.This makes your build files easier to maintain.

40

GUI "Point and Click" UI

• Sample "point and click" GUI interface• Alt+Shift+X, Q to run a task

41

XML Transform• View a homestead record of a specific

parcel ID

Big File(Gigabytes)

XMLTransform

With MatchingRules

VerySmallFile

match

no match

42

Sample XML Transform<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="2.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:mn="http://data.state.mn.us"xmlns:c="http://niem.gov/niem/common/1.0"xmlns:u="http://niem.gov/niem/universal/1.0"xmlns:mnr="http://revenue.state.mn.us"xmlns:mnr-ptx="http://propertytax.state.mn.us"><xsl:output indent="yes" exclude-result-prefixes="mn mnr c u mnr-ptx"/>

<!-- only display the homestead record for this parcel ID --><xsl:template

match="/HomesteadRecordsDocument/CountyHomesteadRecord/HomesteadParcels/HomesteadParcel/CountyPropertyTaxStatement[mn:ParcelID='1234567']">

<!-- copy the CountyHomesteadRecord that matched this parcel ID to the output --><xsl:copy-of select="../../.."/>

</xsl:template>

<!-- do not output anything else --><xsl:template match="@*|node()">

<xsl:apply-templates select="@*|node()"/></xsl:template>

</xsl:stylesheet>

43

V-The File Viewer• $20 application (less

in quantity)• Easily allows viewing

of files greater than 1GB (uses file "chunking" technology)

• Note: read-only toolSee http://www.fileviewer.com/

Opens multi-gigabyte files in a few seconds

44

Use Goto Function

• Goto is (Ctrl-G)

or

45

XML Databases• XML databases store XML in its native

format• You can associate a column in your

databases or a "collection" with the homestead XML Schema

• This allows you to have the database itself validate data before transmission to the state

46

Example of XML Databases• IBM DB2 version 9 "PureXML"

– free and low-cost "express" versions for development and testing

• eXist (open source)– native XML database with XML Schema validation

• Over 50 other free and low-cost solutions with 30, 60 or 90 day evaluation periods

http://www.rpbourret.com/xml/XMLDatabaseProds.htm

47

DB2• IBM DB2 version 9 supports fast searches

on complex XML data sets• Load records into XML datatype• Records are quickly validated using an

XML Schema• Searching is very fast

48

eXist• Open source• Built in web-administration• Easy to setup and configure• Allows data to be validated on insert• Fast searches• Every XQuery IS a REST web service

49

Microsoft SQL Server 2005• Supports native XML datatype• Supports fast indexing• Add SOAP services to XML documents• Support for XQuery and XQuery updates

50

Ant Book

• Covers Ant 1.7

51

Questions?