formal machines for streaming xml querying

19
Streaming XML Kevin Tankersley Machines and Algorithms for Real- Time XML Processing

Upload: justanotherabstraction

Post on 19-Jul-2015

306 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Formal machines for Streaming XML Querying

Streaming XML

Kevin Tankersley

Machines and Algorithms for Real- Time XML Processing

Page 2: Formal machines for Streaming XML Querying

Overview

• XML Filtering Networks

– Overview of XML Processing Tasks

– Streaming XML and XML Data Networks

– XPath Expressions and Regular Expressions

– Node-based NFA Machines for XML Filtering

• Other Formal Models for XML Processing

– Specialized pushdown automata

– Specialized context-free grammars

Page 3: Formal machines for Streaming XML Querying

XML Data

• W3C Standard inspired by HTML– http://www.w3.org/XML/

• Currently used for:– Defining Data

• http://www.w3.org/XML/Schema– Integrating Systems

• http://www.w3.org/TR/soap/• http://www.w3.org/TR/wsdl

– Formatting Data• http://www.w3.org/Style/XSL/• http://www.w3.org/TR/xsl/

– Querying Data• http://www.w3.org/TR/xpath• http://www.w3.org/XML/Query/

Page 4: Formal machines for Streaming XML Querying

DOM Processing

Page 5: Formal machines for Streaming XML Querying

Streaming XML Processing

• Reduce memory requirements by performing XML processing tasks as XML data passes through application

• Example Tasks:– Validate XML

• Ensure XML Data is compliant and well-formed, and that is compliant with DTD/XSD

– Query XML• Extract/Filter subsets of the XML data for further

processing as it passes through application

• Frameworks:– JSR173: Streaming API for XML (StAX)

• javax.xml.stream– .NET XML Streams

Page 6: Formal machines for Streaming XML Querying

Application: XML Data Network

Page 7: Formal machines for Streaming XML Querying

XML Path Language

• Xpath Query:– Location Steps

• Axis• Node test• Predicate

• Axes– Child (default)– Descendent (//)– Attribute (@)

Page 8: Formal machines for Streaming XML Querying

XPath and Regular Expressions

• Consider XPath queries using child and descendent axes, name and * node tests, and no predicates:

• Such queries can be converted to regular expressions:– [university] N* [department]– N* [departments] N [courses]

• Input alphabet consists of nodes N

Page 9: Formal machines for Streaming XML Querying

Designing a Filtering Machine

1. Convert each XPath Query to an NFA

3. Combine into a single NFA– Take advantage of path sharing [Diao et al.,

2003]

5. Convert NFA to a DFA– Constrain to avoid state explosion– Lazy construction [Onizuka, 2003]

6. Add indexes– Stream index [Green et al, 2004]

Page 10: Formal machines for Streaming XML Querying

Example

1. /a/b

2. /a/c

3. /a/b/c

4. /a//b/c

5. /a/*/c

6. /a//c

7. /a/*/*/c

Page 11: Formal machines for Streaming XML Querying

System Architecture

Page 12: Formal machines for Streaming XML Querying

XML as a Context-Free Language

• XML (unlike HTML) must be properly nested– <a><b></b></a> : Valid– <a><b></a></b> : Invalid

• This structure affords the possibility of refining grammars and pushdown automata

• Visibly Pushdown Automata– Refinement of PDAs to enforce proper nesting of

begin and end tags. Originally constructed to analyze call and return sequences in programming languages

• Specialized Document Type Definition– Refinement of context-free grammars to enforce

proper nesting of begin and end tags

Page 13: Formal machines for Streaming XML Querying

Visibly Pushdown Automata

Page 14: Formal machines for Streaming XML Querying

VPDA Example

Page 15: Formal machines for Streaming XML Querying

Specialized DTDs

• Note that tags must properly wrap all expressions yielded by a production

• Note that an SDTD could be converted to a context-free grammar by replacing specializations with nonterminals and nesting production rules

Page 16: Formal machines for Streaming XML Querying

SDTDs and VPDAs• Every VPDA can be converted to an

equivalent PDA

• Every SDTD can be converted into an equivalent context-free grammar

• VPDAs and SDTDs are equivalent in the same way that CFGs and PDAs are

• XML Applications:• Automated machine rewriting for Data

Integration [Thomo et al., 2008]• Streaming type checking [Kumar et al.,

2007]• Streaming querying [Kumar et al., 2007]

Page 17: Formal machines for Streaming XML Querying

References

Page 18: Formal machines for Streaming XML Querying

References

Page 19: Formal machines for Streaming XML Querying