xml data validation an open qa framework february 28, 2005 the exchange network node mentoring...
TRANSCRIPT
![Page 1: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/1.jpg)
XML Data Validation
An Open QA Framework
February 28, 2005
The Exchange NetworkNode Mentoring Workshop
![Page 2: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/2.jpg)
2
Topics
• XML Schema Validation
• Limitations of Schema Validation
• Schematron and extensible stylesheet language transformation (XSLT)
• Data Validation Process
• Implementation and Tools
• Conclusion
![Page 3: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/3.jpg)
3
XML Schema Validation
• Validate if an instance is a well-formed XML document
• Schema validates data types
• Schema validates data structures (child and sibling relationships)
![Page 4: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/4.jpg)
4
Limitations of Schema Validation
• Schema validation cannot:
– Attribute Constrain: If attribute X has a value, attribute Y is required
– Validate Logic Relations: If the parent of element A is element B, it must have an attribute Y, otherwise an attribute Z
– Validate Dependency: If element X has a value M, then Y must exist
![Page 5: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/5.jpg)
5
Limitations of Schema Validation
• Formatted String: A date must have a format of mm-dd-yyyy
• Length Constrain: A value length must be between 9 - 10
• Multiple Ranges: Data must be in the 45-50 and 100-200 range
• Custom Simple Types: i.e., FacilityID
![Page 6: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/6.jpg)
6
NEI Data Example
The XML segment is valid according to NEI schema. But almost all values in the record are fake and invalid
• You really cannot assure data quality using schema
validation alone
<TransmittalSubmissionGroup schemaVersion="3.0"> <TransmittalRecordTypeCode>OO</TransmittalRecordTypeCode> <CountyStateFIPSCode>String</CountyStateFIPSCode> <OrganizationFormalName>String</OrganizationFormalName> <TransactionTypeCode>St</TransactionTypeCode> <InventoryYear>1000</InventoryYear> <InventoryTypeCode>String</InventoryTypeCode> <TransactionCreationDate>10000000</TransactionCreationDate> <SubmissionNumber>0</SubmissionNumber> <ReliabilityIndicator>0</ReliabilityIndicator> <TransactionComment>String</TransactionComment> <IndividualFullName>String</IndividualFullName> <TelephoneNumber>String</TelephoneNumber> <TelephoneNumberTypeName>String</TelephoneNumberTypeName> <ElectronicAddressText>String</ElectronicAddressText> <ElectronicAddressTypeName>String</ElectronicAddressTypeName> <SourceTypeCode>String</SourceTypeCode> <AffiliationTypeText>String</AffiliationTypeText> <FormatVersionNumber>0</FormatVersionNumber> <TribalCode>Str</TribalCode> </TransmittalSubmissionGroup>
![Page 7: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/7.jpg)
7
Schematron
• An XML schema language
• Combine powerful validation capability with simple syntax
• Based on XSLT and XPath
• Open Source Implementation (OSI)
• Currently undergoing Industry Standards Organization (ISO) standardization (ISO/IEC 19757 - DSDL Document Schema Definition Language)
![Page 8: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/8.jpg)
8
Schematron Rules
• A schematron rule has three major parts:
– The context: The element to which a rule applies
– An assertion: A statement about an element, usually an XPath expression
– A result: A statement to be reported if an assertion fails or succeeds
![Page 9: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/9.jpg)
9
Schematron Rule Example
<sch:pattern name = “Final Checks” id = “completed”>
<sch:rule context = “house”>
<sch:assert test = “count(wall) = 4”>A house should have 4 walls.</sch:assert>
</sch:rule>
</sch:pattern>
![Page 10: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/10.jpg)
10
Flow Data Validation Process
XMLDoc
XML Parser
SchemaValidator
XSLTProcessor
Error Report
SchemasSchematron
Rules
Well-FormCheck
Schema Validation
RuleValidation
XMLDoc
XML Parser
SchemaValidator
XSLTProcessor
Error Report
SchemasSchematron
Rules
Well-FormCheck
Schema Validation
RuleValidation
![Page 11: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/11.jpg)
11
Pros and Cons
• Simple rule-based XML validation framework
• Promote natural language description of errors
• Based on open standards (XSLT and XPath)
• Open Source Schematron implementation
• Lack of regular expression support
• Custom validations against existing registries / dictionaries not available
![Page 12: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/12.jpg)
12
Schematron with Extensions
XMLDoc
XSLT Processor
SchematronProcessor
Error Report
MetaSchemas
SchematronRules
XSLT
XPathExtension
FRS
SRS
RegularExpression
Registry Info
Registry Info
XMLDoc
XSLT Processor
SchematronProcessor
Error Report
MetaSchemas
SchematronRules
XSLT
XPathExtension
FRS
SRS
RegularExpression
Registry Info
Registry Info
![Page 13: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/13.jpg)
13
Current Implementation
• A set of Web methods
• Provides both schema validation and schematron validation
• Has synchronous and asynchronous modes
• Supports table lookups to any database tables
• Can process compressed or uncompressed XML documents
• Accessible to any nodes, applications, or users
![Page 14: XML Data Validation An Open QA Framework February 28, 2005 The Exchange Network Node Mentoring Workshop](https://reader035.vdocuments.net/reader035/viewer/2022081516/551aaf9055034656628b4f18/html5/thumbnails/14.jpg)
14
Conclusion
• Streamlined data validation is crucial to successful data exchange
• Data validation should happen as early as possible
• Technologies and tools are available for boosting data quality
• Schematron is a recommended direction