querying streaming xml data

Post on 08-Jan-2016

45 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Querying Streaming XML Data. Layout of the presentation. Introduction Common Problems faced Solution proposed Basic Building blocks of the solution How to build up a solution to a given query Features of the system. Streaming XML. XML – standard for information exchange. - PowerPoint PPT Presentation

TRANSCRIPT

Querying Streaming XML Data

Layout of the presentation

Introduction Common Problems faced Solution proposed Basic Building blocks of the solution How to build up a solution to a given

query Features of the system

Streaming XML XML – standard for information exchange. Some XML documents only available in

streaming format. Streaming is like reading data from a tape

drive. Used in Stock Market, News, Network

Statistics. Predecessor systems used to filter

documents.

Structure of an XPath Query

Consists of a Location path and an Output Expression (name).

Location path consists of closure axis(//), node test (book) and predicate (year>2000).

e.g. //book[year>2000]/name

Features of our Approach

Efficient Easy to understand design. Design of BPDT is tricky

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Test passed. But year=2002?

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Test passed. But year=2002?

Buffer both A & B

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Test passed. But year=2002?

Failed price<11. Remove

Buffer both A & B

Common Problems faced

1. <root>2. <pub>3. <book id=”1”>4. <price> 12.00 </price>5. <name> First </name>6. <author> A </author>7. <price type=”discount”> 10.00 </price>8. </book>

9. <book id=”2”>10. <price> 14.00 </price>11. <name> Second </name>12. <author> A </author>13. <author> B </author>14. <price type=”discount”> 12.00 </price>15. </book>

16. <year> 2002 </year>17. </pub>18. </root>

Query: /pub[year=2002]/book[price<11]/author

Element satisfies the path

Failure??

Test passed. But year=2002?

Failed price<11. Remove

Buffer both A & B

Test passed. Output

Problems caused by closure axis

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>

7. <book>8. <name> Y </name>9. <pub>10. <book>11. <name> Z </name>12. <author> B </author>13. </book>14. <year> 1999 </year>15. </pub>16. </book>17. <year> 2002 </year>18. </pub>19. </root>

Query: //pub[year=2002]//book[author]//name

Pub [year=2002] book [author]

Line 2 True Line 7 False

Line 2 True Line 10 True

Line 9 False Line 10 True

Problems caused by closure axis

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>

7. <book>8. <name> Y </name>9. <pub>10. <book>11. <name> Z </name>12. <author> B </author>13. </book>14. <year> 1999 </year>15. </pub>16. </book>17. <year> 2002 </year>18. </pub>19. </root>

Query: //pub[year=2002]//book[author]//name

Pub [year=2002] book [author]

Line 2 True Line 7 False

Line 2 True Line 10 True

Line 9 False Line 10 True

Fails year=2002

Problems caused by closure axis

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>

7. <book>8. <name> Y </name>9. <pub>10. <book>11. <name> Z </name>12. <author> B </author>13. </book>14. <year> 1999 </year>15. </pub>16. </book>17. <year> 2002 </year>18. </pub>19. </root>

Query: //pub[year=2002]//book[author]//name

Pub [year=2002] book [author]

Line 2 True Line 7 False

Line 2 True Line 10 True

Line 9 False Line 10 True

Fails year=2002

Passes year=2002

Problems caused by closure axis

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>7. <book>8. <name> Y </name>9. <author> B </author>10. <pub>11. <book>12. <name> Z </name>13. <author> B </author>14. </book>15. <year> 1999 </year>16. </pub>17. </book>18. <year> 2002 </year>19. </pub>20. </root>

Query: //pub[year=2002]//book[author]//name

Pub [year=2002] book [author]

Line 2 True Line 7 False

Line 2 True Line 10 True

Line 9 False Line 10 True

Fails year=2002

Passes year=2002

Lets add author. Result?

Handling XML Stream

Input – well formed XML stream. Use SAX API to parse XML. Events belong to

Begin = {(a, attrs, d)} End = {(/a, d)} Text = {(a, text(), d)}

XML Stream: {e1,e2,…,ei,…} ¦

ei Є Begin υ End υ Text

Grammar for XPath Queries Q N+[/O] N [/¦//] tag [F] F [FO[OP constant]] FO @attribute ¦ tag [@attribute] ¦ text() O @attribute ¦ text() OP > ¦ ≥ ¦ = ¦ < ¦ ≥ ¦ ≠ ¦ contains

XPath query of the form N1N2…Nn/O

Cant handle Reverse Axis, Positional Functions.

Solution to QueryQuery: /pub[year=2002]/book[price<11]/author

PDA PDT

Basic PushDown Transducer (BPDT)

Similar to PushDown Automata Actions defined on Transition Arcs Finite set of states

A Start state A set of final states

Set of input symbols Set of Stack symbols

Book – Author: Buffer for future: Begin event of Author.

Book – Author: Remove from Buffer: End event of Book.

Book – Author: Output result if predicates true: Begin event of Author.

Building a BPDTQuery: /pub[year>2000]/book[author]/name/text()

Consider location step: /book[author]

Basic Building Blocks

XPath Expression: /tag[child]

Buffer Operations needed Enqueue(x): Add x to the end of the queue.

Clear(): Removes all items from the queue.

Flush(): Outputs all items in the queue in FIFO order.

Upload(): Moves all items to the end of the queue of a parent BPDT.

No Dequeue operation needed.

Basic Building Blocks

XPath Expression: /tag[@attr=val]

Basic Building Blocks

XPath Expression: /tag[text()=val]

Basic Building Blocks

XPath Expression: /tag[child@attr=val]

Basic Building Blocks

XPath Expression: /tag[child=val]

A sample BPDT

Query: /pub[year>2000]

Building a solutionHPDT for Query:

//pub[year>2000]//book[author]//name/text()

HPDT Structure Each BPDT in HPDT has:

Position BPDT POSITION (l,K) :- l = depth of BPDT in HPDT, K

= sequence # from right to left BPDT Position (i-1,k) – has right child BPDT position

(i,2k) – connected to NA state BPDT Position(i-1,k) – has left child BPDT position

(I,2k+1) – connected to True state. BPDT Position (i, 2i – 1) – means predicates in higher

level BPDT’s evaluate to trueBuffer – potential resultsStack – stack of elements (SAX) eventsDepth Vector

Example Query

1. <root>2. <pub>3. <book>4. <name> X </name>5. <author> A </author>6. </book>

7. <book>8. <name> Y </name>9. <pub>10. <book>11. <name> Z </name>12. <author> B </author>13. </book>14. <year> 1999 </year>15. </pub>16. </book>17. <year> 2002 </year>18. </pub>19. </root>

Query: //pub[year=2002]//book[author]//name

rootpub book name

1 2 7 11

1 2 10 11

1 9 10 11

3 paths from $1 to $14

System Features

Name Support Streaming Multiple

Predicates Closure

Buffered Predicate

Evaluation

XSQ-F XPath X X X X

XSQ-NC XPath X X X

XMLTK XPath X X

XQEngine XQuery X X

Galax XQuery X X

Joost STX X X

Reference Feng Peng and Sudarshan Chawate. XPath Queries

on Streaming Data. In SIGMOD 2003.

Thank You

???

top related