formal machines for streaming xml querying

Post on 19-Jul-2015

306 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Streaming XML

Kevin Tankersley

Machines and Algorithms for Real- Time XML Processing

Overview

• XML Filtering Networks

– Overview of XML Processing Tasks

– Streaming XML and XML Data Networks

– XPath Expressions and Regular Expressions

– Node-based NFA Machines for XML Filtering

• Other Formal Models for XML Processing

– Specialized pushdown automata

– Specialized context-free grammars

XML Data

• W3C Standard inspired by HTML– http://www.w3.org/XML/

• Currently used for:– Defining Data

• http://www.w3.org/XML/Schema– Integrating Systems

• http://www.w3.org/TR/soap/• http://www.w3.org/TR/wsdl

– Formatting Data• http://www.w3.org/Style/XSL/• http://www.w3.org/TR/xsl/

– Querying Data• http://www.w3.org/TR/xpath• http://www.w3.org/XML/Query/

DOM Processing

Streaming XML Processing

• Reduce memory requirements by performing XML processing tasks as XML data passes through application

• Example Tasks:– Validate XML

• Ensure XML Data is compliant and well-formed, and that is compliant with DTD/XSD

– Query XML• Extract/Filter subsets of the XML data for further

processing as it passes through application

• Frameworks:– JSR173: Streaming API for XML (StAX)

• javax.xml.stream– .NET XML Streams

Application: XML Data Network

XML Path Language

• Xpath Query:– Location Steps

• Axis• Node test• Predicate

• Axes– Child (default)– Descendent (//)– Attribute (@)

XPath and Regular Expressions

• Consider XPath queries using child and descendent axes, name and * node tests, and no predicates:

• Such queries can be converted to regular expressions:– [university] N* [department]– N* [departments] N [courses]

• Input alphabet consists of nodes N

Designing a Filtering Machine

1. Convert each XPath Query to an NFA

3. Combine into a single NFA– Take advantage of path sharing [Diao et al.,

2003]

5. Convert NFA to a DFA– Constrain to avoid state explosion– Lazy construction [Onizuka, 2003]

6. Add indexes– Stream index [Green et al, 2004]

Example

1. /a/b

2. /a/c

3. /a/b/c

4. /a//b/c

5. /a/*/c

6. /a//c

7. /a/*/*/c

System Architecture

XML as a Context-Free Language

• XML (unlike HTML) must be properly nested– <a><b></b></a> : Valid– <a><b></a></b> : Invalid

• This structure affords the possibility of refining grammars and pushdown automata

• Visibly Pushdown Automata– Refinement of PDAs to enforce proper nesting of

begin and end tags. Originally constructed to analyze call and return sequences in programming languages

• Specialized Document Type Definition– Refinement of context-free grammars to enforce

proper nesting of begin and end tags

Visibly Pushdown Automata

VPDA Example

Specialized DTDs

• Note that tags must properly wrap all expressions yielded by a production

• Note that an SDTD could be converted to a context-free grammar by replacing specializations with nonterminals and nesting production rules

SDTDs and VPDAs• Every VPDA can be converted to an

equivalent PDA

• Every SDTD can be converted into an equivalent context-free grammar

• VPDAs and SDTDs are equivalent in the same way that CFGs and PDAs are

• XML Applications:• Automated machine rewriting for Data

Integration [Thomo et al., 2008]• Streaming type checking [Kumar et al.,

2007]• Streaming querying [Kumar et al., 2007]

References

References

top related