twigstacklist¬: a holistic twig join algorithm for twig query with not-predicates on xml data

46
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian Yu

Upload: lala

Post on 18-Mar-2016

43 views

Category:

Documents


1 download

DESCRIPTION

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data. by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian Yu. Outline. Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates Notation and data structure - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query

with Not-predicates on XML Data

by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian Yu

Page 2: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 2

Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates

Notation and data structure A holistic matching algorithm: TwigStackList¬

Experiments Conclusion

Page 3: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 3

Introduction XML data representation rapidly increases

popularity

XML documents modeled as ordered trees.

XML queries specify patterns of selection predicates on multiple elements having some structural relationships (parent-child, ancestor-descendant)

Page 4: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 4

Introduction: NOT-twig Query Twig query: a small tree whose nodes are tags, attributes or text values and edges are either Parent-Child (P-C) edges or Ancestor-Descendant (A-D) edges.

Twig query can have not-predicates Example:

Normal-twig query (without not-predicate): Q1:suppliersDatabase/supplier[//store]//part Not-twig query (with not-predicates): Q2:suppliersDatabase/supplier[NOT(//store)]//part

Intuitive meaning: selects all part elements which are descendent of supplier elements having no descendant element store.

s u p p lie r

p ar ts to r e

s u p p lie rD atab as e

s u p p ile r

s to r e(a ) Q 1 (b ) Q 2

p ar t

s u p p lie rD atab as e

Page 5: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 5

Naïve Method for evaluating not-twig queries: Decompose the NOT-twig query into several normal-twig

queries Evaluate the decomposed queries with existing method

(twigStack or twigStackList) final result can be calculated (using set operators) based on

the results of the individual decomposed quires. For example:

Motivation:

s u p p iler

s to re p ar t

s u p p lie rD atab as e

s u p p ile r

s to r e p ar t

s u p p lie r D a tab as e

s u p p ile r

p ar t

s u p p lier D atab as eQ2: Two decomposed Queries:

Page 6: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 6

Disadvantage of the naïve method: Not optimal. Many useless intermediate results are

produced Additional disk I/O: Clearly many elements are

accessed more than once.

Our objective: Efficient processing of XML not-twig queries.

Motivation:

Page 7: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 7

Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates

Notation and data structure A holistic matching algorithm: TwigStackList¬

Experiments Conclusion

Page 8: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 8

TwigStack [1]: a holistic approach to match twig query without not-predicates Two-phase algorithm:

Phase 1: The intermediate root to leaf path solutions are outputtedPhase 2: Merge the intermediate path solutions to get the final results

TwigStack: optimal when the query contains only ancester-descendant relationship

parent-child relationship: TwigStack may output some intermediate path solutions that cannot contribute to final solutions.

Therefore, TwigStack is sub-optimal for queries with parent-child relationships

Previous work: TwigStack [1]

1. N. Bruno, D. Srivastava, and N. Koudas. “Holistic twig joins: optimal xml pattern matching.” In Proceedings of ACM SIGMOD, 2002.

Page 9: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 9

The main problem of TwigStack is to assume all edges are ancestor-descendant relationship in the first phase. So it is not efficient for queries with parent-child relationships.

Improved method: TwigStackList [2] There is an additional list structure for each query node

to cache elements that likely participate in final solutions.

TwigStackList is optimal when there is no P-C edge for branching nodes, therefore, it is more efficient than TwigStack.

2. J. Lu, T. Chen, and T. W. Ling. “Efficient processing of xml twig patterns with parent child edges: a look-ahead approach.” In Proceedings of CIKM, pages 533- 542, 2004.

Previous work: TwigStackList [2]

Page 10: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 10

Previous work: Query Predicates[3][4][5]

XML Query Predicate OR-predicates [3] Ordered-predicates [4] not-predicates on XPath [5]

3. H. Jiang, H. Lu, and W. Wang. “Efficient processing of twig queries with OR-predicates.” In Proceeding of SIGMOD, pages 59–70, 2004.

4. J. Lu, T. W. Ling, T. Yu, C. Li, and W. Ni. “Efficient processing of ordered XML twig pattern.” In Proceeding of DEXA, 2005.

5. E. Jiao, T.W. Ling, C. Y. Chan, and P. S. Yu. “Pathstack¬: A holistic path join algorithm for path query with not-predicates on XML data.” In Proceedings of DASFAA, 2005.

Page 11: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 11

Previous work: Query Predicates[3][4][5] OR-predicate

Define OR-block AND/OR-twig can be viewed as an AND-twig with elements and OR-blocks.

Ordered XML query matching Use look-ahead list to check for order Support Four axis: following-sibling, preceding-sibling, following, and

preceding 3. H. Jiang, H. Lu, and W. Wang. “Efficient processing of twig queries with OR-predicates.” In Proceeding

of SIGMOD, pages 59–70, 2004.4. J. Lu, T. W. Ling, T. Yu, C. Li, and W. Ni. “Efficient processing of ordered XML twig pattern.” In

Proceeding of DEXA, 2005.

a

dc

>

b

Page 12: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 12

Previous work: not-Predicates on XPath [5] It evaluating path queries with not-predicates, based on PathStack

algorithm Define: non-output node, output node in a query

non-output node: node that appear below a negative edge output node: node that does not appear below any negative edge

XPath query with NOT-predicate Each query node is also associated with a stack which is either a regular

stack or a boolean stack Check NOT-predicate by updating the boolean value

Cannot match not-twig queriesn 1 : A

||n 2 : B

||n 3 : C

||n 4 : D

5. E. Jiao, T.W. Ling, C. Y. Chan, and P. S. Yu. “Pathstack¬: A holistic path join algorithm for path query with not-predicates on XML data.” In Proceedings of DASFAA, 2005.

<A 1 , B2 >S1S b o o l

2

C 1 B 1, f A 1D 1

S 4 S3

Output node: A, BNon-Output node: C, D

Page 13: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 13

Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates

Notation and data structure A holistic matching algorithm: TwigStackList¬

Experiments Conclusion

Page 14: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 14

Data model An XML document is modeled as a rooted, ordered and tagged tree Labelling scheme: Region Coding (startPos, endPos, LevelNum) Given e1, e2:

e1 is ancestor of e2: iff e1.start < e2.start and e1.end > e2.end e1 is parent of e2: iff e1.start < e2.start and e1.end > e2.end, and e1.level + 1= e2.level

“…”

book

preface chapter chapter

section title

“Data..”

“Intro..”

“…”

(1,123,1)

(2,4,2)

(3,3,3)

(13,24,2)(5,12,2)

(9,11,3)

(6,8,3)

(7,7,4) (10,10,4)

section title

“Data..” “…”

(17,19,3)(14,16,3)

(15,15,4) (18,18,4)

e1

e2

e2

section …

… chapter

Page 15: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 15

Proposed Subquery Matching Method: Positive/negative children of a query node:

Positive children of A: C, D, E Negative children of A: B, F

NOT twig queryC

A

B D E F

Page 16: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 16

Proposed Subquery Matching Method: Given a NOT-twig query (with a query node n) and an XML data

tree (Data), we say that an element en (with the tag n) in the XML data tree satisfies the sub-query rooted at n iff:

(1) n is a leaf node of NOT-query; OR (2) For each child node m of n: (Four cases) (case i)If m is a positive PC child node of n, there is an element

em in Data such that em is a child element of en and satisfies the sub-query rooted at m in Data.

(case ii)If m is a positive AD child node of n, there is an element em in Data such that em is a descendant element of en and satisfies the sub-query rooted at m in Data.

n

mQuery

en

e mData

Case 1

mQuery

en

e mData

Case 2

n

Page 17: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 17

Proposed Subquery Matching Method: (case iii)If m is a negative PC child node of n, there does not

exist any element em in Data such that em is a child element of en and satisfies the sub-query rooted at m in Data.

(case iv)If m is a negative AD child node of n, there does not exist any element em in Data such that em is a descendant element of en and satisfies the sub query rooted at m in Data.

n

mQuery

en

e mData

Case 1

n

e m

n

e m

e… …

mQuery

en

e mData

Case 2

n

mQuery Data

Case 3

n

Query

Case 4

n

m

e

Data

Page 18: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 18

Leaf nodes: B1, B2 and D1 has satisfy the subqueries rooted at B and D respectively.

C1 satisfies subquery rooted at C, since D1 is a descendant of C1

Since C1 satisfies the subquery rooted at C, we can safely say A1 does not satisfy subquery rooted at A. It is because C1 is a child of A1 and in the NOT-twig, node C is a negative child of node A.

A2 satisfies subquery rooted at A since A2 has a descendent B2 satisfies subquery rooted at B, and does not have any child with the tag C.

Subquery Matching: Example

(b ) qu e ry 1

A 1

(a ) XML T re e

C 1

D 1

A

B

B 1

A 2

B 2 D

C

Page 19: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 19

Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates

Notation and data structure A holistic matching algorithm: TwigStackList¬

Experiments Conclusion

Page 20: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 20

TwigStackList¬ Algorithm: Data structure Each node n in the twig query has: Stream, List, and Stack Data Stream: Tn

we partition an XML document into streams All elements in a stream are of the same tag and ordered by their start

Position The elements in each stream is read only once from head to tail.

a1, a2, a3

b1 , b2

C1 , C2

d1, d2, d3

a

dcb

Ta

Tb

Tc

Td

Document

2:

3:

a1

a2 a3 b2

d2 b1d3

c2

d1

c14:

Level 1:

Page 21: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 21

TwigStackList¬ Algorithm: Data structure Each node with tag name n in the twig query has: Stream, List, and

Stack Data Stream: Tn List: Ln

The elements in lists help to check for P-C relationship and negative edges

Stack: Sn Stacks is used for each output node Store elements that are the potential solutions of the XML query. The stack size if bounded by the document root to leaf depth.

a1, a2…

b1 .. C1 ..

d1 ,d3 ..

a

dcb

La

Lb Lc

Ld

Sa

Sb

Page 22: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 22

Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates

Notation and data structure A holistic matching algorithm: TwigStackList¬

Experiments Conclusion

Page 23: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 23

Document:

Query: A

B C

A1

B1

C1

B1, B2

A

D

C1

B2

D1

D1 A2

TwigStackList¬ Algorithm: An Example

A1, A2T

BT

CT

DT

Page 24: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 24

Document:

Query: A

B C

A1

B1

C1

B1, B2 D

C1

B2

D1

D1 A2

Create Stacks of every output nodesNext Action:

TwigStackList¬ Algorithm: An Example

A1, A2AT

BT

CT

DT

Page 25: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 25

Document:

Query: A

B C

A1

B1

C1

B1, B2 D

C1

B2

D1

D1 A2

Create lists of every query nodesNext Action:

TwigStackList¬ Algorithm: An Example

A1, A2

AT

BT

CT

DT

Page 26: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 26

Document:

Query: A

B C

A1

B1

C1

B1, B2 D

C1

B2

D1

D1 A2

Push C1 into list of CNext Action:

TwigStackList¬ Algorithm: An Example

A1, A2

C1 AT

BT

CT

DTC1 has a child D1

Page 27: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 27

Document:

Query: A

B C

A1

B1

C1

B1, B2 D

C1

B2

D1

D1 A2

Push A1 into list of ANext Action:

TwigStackList¬ Algorithm: An Example

A1, A2

C1

A1

AT

BT

CT

DT A1 has descendant B1

Page 28: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 28

Document:

Query: A

B C

A1

B1

C1

B1, B2 D

C1

B2

D1

D1 A2

Remove A1 from list of ANext Action:

TwigStackList¬ Algorithm: An Example

A1, A2

C1

A1

AT

BT

CT

DTC1 is a child of A1, thus A1 does not satisfy the negative edge of the query between A and C.

Page 29: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 29

Document:

Query: A

B C

A1

B1

C1

B1, B2 D

C1

B2

D1

D1 A2

Advance ANext Action:

TwigStackList¬ Algorithm: An Example

A1, A2

C1 AT

BT

CT

DT

Page 30: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 30

Document:

Query: A

B C

A1

B1

C1

B1, B2 D

C1

B2

D1

D1 A2

Advance BNext Action:

TwigStackList¬ Algorithm: An Example

A1, A2

C1 AT

BT

CT

DT

Page 31: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 31

Document:

Query: A

B C

A1

B1

C1

B1, B2 D

C1

B2

D1

D1 A2

Advance C, reach end of streamNext Action:

TwigStackList¬ Algorithm: An Example

A1, A2

C1 AT

BT

CT

DT

Page 32: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 32

Document:

Query: A

B C

A1

B1

C1

B1, B2 D

C1

B2

D1

D1 A2

Push A2 into list of ANext Action:

TwigStackList¬ Algorithm: An Example

A1, A2

A2

A2 has descendant B2

C1 AT

BT

CT

DT

Page 33: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 33

Document:

Query: A

B C

A1

B1

C1

B1, B2 D

C1

B2

D1

D1 A2

Final Solution: A2, B2Next Action:

TwigStackList¬ Algorithm: An Example

A1, A2

A2

A2 has descendant B2 and no child with tag C

C1 AT

BT

CT

DT

Page 34: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 34

TwigStackList¬ Algorithm: Optimality TwigStack is optimal for A-D only relationship. TwigStackList is optimal when no branching node has P-C

relationship. TwigStackList¬ is optimal if a branching node doesn’t have

more than one positive relationships among which at least one is P-C edge.

A

B C

Negative P-C for Branch node

A-D only

A-D for branching node A

B C D

Page 35: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 35

Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates

Notation and data structure A holistic matching algorithm: TwigStackList¬

Experiments Conclusion

Page 36: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 36

TwigStackList¬ Algorithm: Experiments Algorithms for comparison:

naÏve -TwigStack (NTS) naÏve-TwigStackList (NTSL) Our proposed TwigStackList¬

Benchmarks XMark: Synthetic Data Treebank: Real Data from Wall Street Journal Random Data set: random uniformly distributed data

Evaluation metrics Number of intermediate path solutions Total running time

Page 37: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 37

Testing Queires for TwigStackList¬ : Q(a)-Q(e) TreeBank, Q(f)-Q(h) Xmark, Q(i)-Q(k) Random Dataset

acb

ed gfQ (k )

a

db c

fe gQ (j)

a

b

d

Q (i)e

c

f

g

tex t

bold k ey wordem ph

Q (h)

tex t

bold k ey wordem ph

Q (f)

tex t

bo ld k ey wordem ph

Q (g)

S

AD JM D

Q (a) Q (b )

IN

VP

N P

P P

S

VBN

IN

VP

N P

P P

S

VBNQ (c )

D T P P P R P _ D o lla r _

VP

VBN

Q (d ) Q (e)

IN

VP

N P

P P

S

VBN

TwigStackList¬ Algorithm: Experiments

Page 38: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 38

TwigStackList¬ has the smallest intermediate results

NTS, NTSL match more than one Twig Patterns

TwigStackList¬ Algorithm: Intermediate result

Page 39: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 39

Treebank and random data: TwigStackList¬ has the smallest running time

1. They output more intermediate results.

2. They match more than one twig queries.

TwigStackList¬ Algorithm: Execution time

0153045607590

Q(a) Q(b) Q(c) Q(d) Q(e)

Tim

e (s

econ

d)

NTS NTSL TwigStackList

0369

12151821

Q(i) Q(j) Q(k)

Tim

e (s

econ

d)

NTS NTSL TwigStackList

Treebank testing result Random dataset testing result

Page 40: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 40

TwigStackList¬ Algorithm: Execution time

203040506070

Q(f) Q(g) Q(h)

Tim

e (s

econ

d)

NTS NTSL TwigStackList

tex t

bo ld k ey w ordem ph

Q (h)

tex t

bo ld k ey wordem ph

Q (f)

tex t

bo ld k ey w ordem ph

Q (g)

XMark Queries XMark testing result

XMark: TwigStackList¬ has the smallest running time Query Q(f), Q(g), Q(h) have the same query nodes and structure,

but the number of not-predicates is increasing. NTS and NTSL execution time increases linearly, because the

number of decomposed queries is increasing. TwigStackList¬ execution time is almost constant.

Page 41: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 41

Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates

Notation and data structure A holistic matching algorithm: TwigStackList¬

Experiments Conclusion

Page 42: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 42

Conclusions We developed a new algorithm TwigStackList¬ to

match Twig Pattern with not-predicates. Our algorithm can identify a larger query class to

guarantee I/O optimally Experimental results showed the effectiveness and

efficiency of our algorithm

Page 43: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 43

END Thank you!

Q & A

Page 44: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 44

Reference 1. N. Bruno, D. Srivastava, and N. Koudas. “Holistic twig joins: optimal

xml pattern matching.” In Proceedings of ACM SIGMOD, 2002.2. J. Lu, T. Chen, and T. W. Ling. “Efficient processing of XML twig

patterns with parent child edges: a look-ahead approach.” In Proceedings of CIKM, pages 533- 542, 2004.

3. H. Jiang, H. Lu, and W. Wang. “Efficient processing of twig queries with OR-predicates.” In Proceeding of SIGMOD, pages 59–70, 2004.

4. J. Lu, T. W. Ling, T. Yu, C. Li, and W. Ni. “Efficient processing of ordered XML twig pattern.” In Proceeding of DEXA, 2005.

5. E. Jiao, T.W. Ling, C. Y. Chan, and P. S. Yu. “Pathstack¬: A holistic path join algorithm for path query with not-predicates on XML data.” In Proceedings of DASFAA, 2005.

Page 45: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 45

1. S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu, “Structural Joins: A Primitive for Efficient XML Query Pattern Matching.” In Proceedings of ICDE Conf. 2002.

Previous work: binary structural Join TreeMerge and Stack-Merge [1]:

Break into binary relationship branches Stitch the branches together

A novel stack-based binary join algorithm Disadvantage: large intermediate results

Appendix (1)

Page 46: TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with  Not-predicates on XML Data

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 46

Previous work: TwigStackList v.s. TwigStack

TwigStack output the it output the “uesless” intermediate path solution < s1,t1>, since it doesn’t check for parent-child relationsihp. TwigStackList has no uesless intermediate output. < s1,t1> is not in the

output.

Twig Pattern

s1

p1

section

title paragraph

figure

p3

f1

t1

An XML tree

t2

s2

p2t3

f2

Root

s1

t1

No Parent-child relationship for branching node

Appendix (2)