twigstacklist¬: a holistic twig join algorithm for twig query with not-predicates on xml data
Post on 18-Mar-2016
43 Views
Preview:
DESCRIPTION
TRANSCRIPT
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query
with Not-predicates on XML Data
by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian Yu
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 2
Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates
Notation and data structure A holistic matching algorithm: TwigStackList¬
Experiments Conclusion
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 3
Introduction XML data representation rapidly increases
popularity
XML documents modeled as ordered trees.
XML queries specify patterns of selection predicates on multiple elements having some structural relationships (parent-child, ancestor-descendant)
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 4
Introduction: NOT-twig Query Twig query: a small tree whose nodes are tags, attributes or text values and edges are either Parent-Child (P-C) edges or Ancestor-Descendant (A-D) edges.
Twig query can have not-predicates Example:
Normal-twig query (without not-predicate): Q1:suppliersDatabase/supplier[//store]//part Not-twig query (with not-predicates): Q2:suppliersDatabase/supplier[NOT(//store)]//part
Intuitive meaning: selects all part elements which are descendent of supplier elements having no descendant element store.
s u p p lie r
p ar ts to r e
s u p p lie rD atab as e
s u p p ile r
s to r e(a ) Q 1 (b ) Q 2
p ar t
s u p p lie rD atab as e
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 5
Naïve Method for evaluating not-twig queries: Decompose the NOT-twig query into several normal-twig
queries Evaluate the decomposed queries with existing method
(twigStack or twigStackList) final result can be calculated (using set operators) based on
the results of the individual decomposed quires. For example:
Motivation:
s u p p iler
s to re p ar t
s u p p lie rD atab as e
s u p p ile r
s to r e p ar t
s u p p lie r D a tab as e
s u p p ile r
p ar t
s u p p lier D atab as eQ2: Two decomposed Queries:
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 6
Disadvantage of the naïve method: Not optimal. Many useless intermediate results are
produced Additional disk I/O: Clearly many elements are
accessed more than once.
Our objective: Efficient processing of XML not-twig queries.
Motivation:
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 7
Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates
Notation and data structure A holistic matching algorithm: TwigStackList¬
Experiments Conclusion
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 8
TwigStack [1]: a holistic approach to match twig query without not-predicates Two-phase algorithm:
Phase 1: The intermediate root to leaf path solutions are outputtedPhase 2: Merge the intermediate path solutions to get the final results
TwigStack: optimal when the query contains only ancester-descendant relationship
parent-child relationship: TwigStack may output some intermediate path solutions that cannot contribute to final solutions.
Therefore, TwigStack is sub-optimal for queries with parent-child relationships
Previous work: TwigStack [1]
1. N. Bruno, D. Srivastava, and N. Koudas. “Holistic twig joins: optimal xml pattern matching.” In Proceedings of ACM SIGMOD, 2002.
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 9
The main problem of TwigStack is to assume all edges are ancestor-descendant relationship in the first phase. So it is not efficient for queries with parent-child relationships.
Improved method: TwigStackList [2] There is an additional list structure for each query node
to cache elements that likely participate in final solutions.
TwigStackList is optimal when there is no P-C edge for branching nodes, therefore, it is more efficient than TwigStack.
2. J. Lu, T. Chen, and T. W. Ling. “Efficient processing of xml twig patterns with parent child edges: a look-ahead approach.” In Proceedings of CIKM, pages 533- 542, 2004.
Previous work: TwigStackList [2]
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 10
Previous work: Query Predicates[3][4][5]
XML Query Predicate OR-predicates [3] Ordered-predicates [4] not-predicates on XPath [5]
3. H. Jiang, H. Lu, and W. Wang. “Efficient processing of twig queries with OR-predicates.” In Proceeding of SIGMOD, pages 59–70, 2004.
4. J. Lu, T. W. Ling, T. Yu, C. Li, and W. Ni. “Efficient processing of ordered XML twig pattern.” In Proceeding of DEXA, 2005.
5. E. Jiao, T.W. Ling, C. Y. Chan, and P. S. Yu. “Pathstack¬: A holistic path join algorithm for path query with not-predicates on XML data.” In Proceedings of DASFAA, 2005.
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 11
Previous work: Query Predicates[3][4][5] OR-predicate
Define OR-block AND/OR-twig can be viewed as an AND-twig with elements and OR-blocks.
Ordered XML query matching Use look-ahead list to check for order Support Four axis: following-sibling, preceding-sibling, following, and
preceding 3. H. Jiang, H. Lu, and W. Wang. “Efficient processing of twig queries with OR-predicates.” In Proceeding
of SIGMOD, pages 59–70, 2004.4. J. Lu, T. W. Ling, T. Yu, C. Li, and W. Ni. “Efficient processing of ordered XML twig pattern.” In
Proceeding of DEXA, 2005.
a
dc
>
b
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 12
Previous work: not-Predicates on XPath [5] It evaluating path queries with not-predicates, based on PathStack
algorithm Define: non-output node, output node in a query
non-output node: node that appear below a negative edge output node: node that does not appear below any negative edge
XPath query with NOT-predicate Each query node is also associated with a stack which is either a regular
stack or a boolean stack Check NOT-predicate by updating the boolean value
Cannot match not-twig queriesn 1 : A
||n 2 : B
||n 3 : C
||n 4 : D
5. E. Jiao, T.W. Ling, C. Y. Chan, and P. S. Yu. “Pathstack¬: A holistic path join algorithm for path query with not-predicates on XML data.” In Proceedings of DASFAA, 2005.
<A 1 , B2 >S1S b o o l
2
C 1 B 1, f A 1D 1
S 4 S3
Output node: A, BNon-Output node: C, D
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 13
Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates
Notation and data structure A holistic matching algorithm: TwigStackList¬
Experiments Conclusion
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 14
Data model An XML document is modeled as a rooted, ordered and tagged tree Labelling scheme: Region Coding (startPos, endPos, LevelNum) Given e1, e2:
e1 is ancestor of e2: iff e1.start < e2.start and e1.end > e2.end e1 is parent of e2: iff e1.start < e2.start and e1.end > e2.end, and e1.level + 1= e2.level
“…”
book
preface chapter chapter
section title
“Data..”
“Intro..”
“…”
(1,123,1)
(2,4,2)
(3,3,3)
(13,24,2)(5,12,2)
(9,11,3)
(6,8,3)
(7,7,4) (10,10,4)
section title
“Data..” “…”
(17,19,3)(14,16,3)
(15,15,4) (18,18,4)
e1
e2
e2
section …
… chapter
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 15
Proposed Subquery Matching Method: Positive/negative children of a query node:
Positive children of A: C, D, E Negative children of A: B, F
NOT twig queryC
A
B D E F
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 16
Proposed Subquery Matching Method: Given a NOT-twig query (with a query node n) and an XML data
tree (Data), we say that an element en (with the tag n) in the XML data tree satisfies the sub-query rooted at n iff:
(1) n is a leaf node of NOT-query; OR (2) For each child node m of n: (Four cases) (case i)If m is a positive PC child node of n, there is an element
em in Data such that em is a child element of en and satisfies the sub-query rooted at m in Data.
(case ii)If m is a positive AD child node of n, there is an element em in Data such that em is a descendant element of en and satisfies the sub-query rooted at m in Data.
n
mQuery
en
e mData
Case 1
…
mQuery
en
e mData
Case 2
n
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 17
Proposed Subquery Matching Method: (case iii)If m is a negative PC child node of n, there does not
exist any element em in Data such that em is a child element of en and satisfies the sub-query rooted at m in Data.
(case iv)If m is a negative AD child node of n, there does not exist any element em in Data such that em is a descendant element of en and satisfies the sub query rooted at m in Data.
n
mQuery
en
e mData
Case 1
n
e m
n
e m
e… …
mQuery
en
e mData
Case 2
n
mQuery Data
Case 3
n
Query
Case 4
n
m
e
Data
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 18
Leaf nodes: B1, B2 and D1 has satisfy the subqueries rooted at B and D respectively.
C1 satisfies subquery rooted at C, since D1 is a descendant of C1
Since C1 satisfies the subquery rooted at C, we can safely say A1 does not satisfy subquery rooted at A. It is because C1 is a child of A1 and in the NOT-twig, node C is a negative child of node A.
A2 satisfies subquery rooted at A since A2 has a descendent B2 satisfies subquery rooted at B, and does not have any child with the tag C.
Subquery Matching: Example
(b ) qu e ry 1
A 1
(a ) XML T re e
C 1
D 1
A
B
B 1
A 2
B 2 D
C
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 19
Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates
Notation and data structure A holistic matching algorithm: TwigStackList¬
Experiments Conclusion
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 20
TwigStackList¬ Algorithm: Data structure Each node n in the twig query has: Stream, List, and Stack Data Stream: Tn
we partition an XML document into streams All elements in a stream are of the same tag and ordered by their start
Position The elements in each stream is read only once from head to tail.
a1, a2, a3
b1 , b2
C1 , C2
d1, d2, d3
a
dcb
Ta
Tb
Tc
Td
Document
2:
3:
a1
a2 a3 b2
d2 b1d3
c2
d1
c14:
Level 1:
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 21
TwigStackList¬ Algorithm: Data structure Each node with tag name n in the twig query has: Stream, List, and
Stack Data Stream: Tn List: Ln
The elements in lists help to check for P-C relationship and negative edges
Stack: Sn Stacks is used for each output node Store elements that are the potential solutions of the XML query. The stack size if bounded by the document root to leaf depth.
a1, a2…
b1 .. C1 ..
d1 ,d3 ..
a
dcb
La
Lb Lc
Ld
Sa
Sb
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 22
Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates
Notation and data structure A holistic matching algorithm: TwigStackList¬
Experiments Conclusion
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 23
Document:
Query: A
B C
A1
B1
C1
B1, B2
A
D
C1
B2
D1
D1 A2
TwigStackList¬ Algorithm: An Example
A1, A2T
BT
CT
DT
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 24
Document:
Query: A
B C
A1
B1
C1
B1, B2 D
C1
B2
D1
D1 A2
Create Stacks of every output nodesNext Action:
TwigStackList¬ Algorithm: An Example
A1, A2AT
BT
CT
DT
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 25
Document:
Query: A
B C
A1
B1
C1
B1, B2 D
C1
B2
D1
D1 A2
Create lists of every query nodesNext Action:
TwigStackList¬ Algorithm: An Example
A1, A2
AT
BT
CT
DT
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 26
Document:
Query: A
B C
A1
B1
C1
B1, B2 D
C1
B2
D1
D1 A2
Push C1 into list of CNext Action:
TwigStackList¬ Algorithm: An Example
A1, A2
C1 AT
BT
CT
DTC1 has a child D1
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 27
Document:
Query: A
B C
A1
B1
C1
B1, B2 D
C1
B2
D1
D1 A2
Push A1 into list of ANext Action:
TwigStackList¬ Algorithm: An Example
A1, A2
C1
A1
AT
BT
CT
DT A1 has descendant B1
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 28
Document:
Query: A
B C
A1
B1
C1
B1, B2 D
C1
B2
D1
D1 A2
Remove A1 from list of ANext Action:
TwigStackList¬ Algorithm: An Example
A1, A2
C1
A1
AT
BT
CT
DTC1 is a child of A1, thus A1 does not satisfy the negative edge of the query between A and C.
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 29
Document:
Query: A
B C
A1
B1
C1
B1, B2 D
C1
B2
D1
D1 A2
Advance ANext Action:
TwigStackList¬ Algorithm: An Example
A1, A2
C1 AT
BT
CT
DT
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 30
Document:
Query: A
B C
A1
B1
C1
B1, B2 D
C1
B2
D1
D1 A2
Advance BNext Action:
TwigStackList¬ Algorithm: An Example
A1, A2
C1 AT
BT
CT
DT
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 31
Document:
Query: A
B C
A1
B1
C1
B1, B2 D
C1
B2
D1
D1 A2
Advance C, reach end of streamNext Action:
TwigStackList¬ Algorithm: An Example
A1, A2
C1 AT
BT
CT
DT
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 32
Document:
Query: A
B C
A1
B1
C1
B1, B2 D
C1
B2
D1
D1 A2
Push A2 into list of ANext Action:
TwigStackList¬ Algorithm: An Example
A1, A2
A2
A2 has descendant B2
C1 AT
BT
CT
DT
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 33
Document:
Query: A
B C
A1
B1
C1
B1, B2 D
C1
B2
D1
D1 A2
Final Solution: A2, B2Next Action:
TwigStackList¬ Algorithm: An Example
A1, A2
A2
A2 has descendant B2 and no child with tag C
C1 AT
BT
CT
DT
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 34
TwigStackList¬ Algorithm: Optimality TwigStack is optimal for A-D only relationship. TwigStackList is optimal when no branching node has P-C
relationship. TwigStackList¬ is optimal if a branching node doesn’t have
more than one positive relationships among which at least one is P-C edge.
A
B C
Negative P-C for Branch node
A-D only
A-D for branching node A
B C D
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 35
Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates
Notation and data structure A holistic matching algorithm: TwigStackList¬
Experiments Conclusion
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 36
TwigStackList¬ Algorithm: Experiments Algorithms for comparison:
naÏve -TwigStack (NTS) naÏve-TwigStackList (NTSL) Our proposed TwigStackList¬
Benchmarks XMark: Synthetic Data Treebank: Real Data from Wall Street Journal Random Data set: random uniformly distributed data
Evaluation metrics Number of intermediate path solutions Total running time
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 37
Testing Queires for TwigStackList¬ : Q(a)-Q(e) TreeBank, Q(f)-Q(h) Xmark, Q(i)-Q(k) Random Dataset
acb
ed gfQ (k )
a
db c
fe gQ (j)
a
b
d
Q (i)e
c
f
g
tex t
bold k ey wordem ph
Q (h)
tex t
bold k ey wordem ph
Q (f)
tex t
bo ld k ey wordem ph
Q (g)
S
AD JM D
Q (a) Q (b )
IN
VP
N P
P P
S
VBN
IN
VP
N P
P P
S
VBNQ (c )
D T P P P R P _ D o lla r _
VP
VBN
Q (d ) Q (e)
IN
VP
N P
P P
S
VBN
TwigStackList¬ Algorithm: Experiments
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 38
TwigStackList¬ has the smallest intermediate results
NTS, NTSL match more than one Twig Patterns
TwigStackList¬ Algorithm: Intermediate result
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 39
Treebank and random data: TwigStackList¬ has the smallest running time
1. They output more intermediate results.
2. They match more than one twig queries.
TwigStackList¬ Algorithm: Execution time
0153045607590
Q(a) Q(b) Q(c) Q(d) Q(e)
Tim
e (s
econ
d)
NTS NTSL TwigStackList
0369
12151821
Q(i) Q(j) Q(k)
Tim
e (s
econ
d)
NTS NTSL TwigStackList
Treebank testing result Random dataset testing result
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 40
TwigStackList¬ Algorithm: Execution time
203040506070
Q(f) Q(g) Q(h)
Tim
e (s
econ
d)
NTS NTSL TwigStackList
tex t
bo ld k ey w ordem ph
Q (h)
tex t
bo ld k ey wordem ph
Q (f)
tex t
bo ld k ey w ordem ph
Q (g)
XMark Queries XMark testing result
XMark: TwigStackList¬ has the smallest running time Query Q(f), Q(g), Q(h) have the same query nodes and structure,
but the number of not-predicates is increasing. NTS and NTSL execution time increases linearly, because the
number of decomposed queries is increasing. TwigStackList¬ execution time is almost constant.
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 41
Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates
Notation and data structure A holistic matching algorithm: TwigStackList¬
Experiments Conclusion
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 42
Conclusions We developed a new algorithm TwigStackList¬ to
match Twig Pattern with not-predicates. Our algorithm can identify a larger query class to
guarantee I/O optimally Experimental results showed the effectiveness and
efficiency of our algorithm
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 43
END Thank you!
Q & A
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 44
Reference 1. N. Bruno, D. Srivastava, and N. Koudas. “Holistic twig joins: optimal
xml pattern matching.” In Proceedings of ACM SIGMOD, 2002.2. J. Lu, T. Chen, and T. W. Ling. “Efficient processing of XML twig
patterns with parent child edges: a look-ahead approach.” In Proceedings of CIKM, pages 533- 542, 2004.
3. H. Jiang, H. Lu, and W. Wang. “Efficient processing of twig queries with OR-predicates.” In Proceeding of SIGMOD, pages 59–70, 2004.
4. J. Lu, T. W. Ling, T. Yu, C. Li, and W. Ni. “Efficient processing of ordered XML twig pattern.” In Proceeding of DEXA, 2005.
5. E. Jiao, T.W. Ling, C. Y. Chan, and P. S. Yu. “Pathstack¬: A holistic path join algorithm for path query with not-predicates on XML data.” In Proceedings of DASFAA, 2005.
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 45
1. S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu, “Structural Joins: A Primitive for Efficient XML Query Pattern Matching.” In Proceedings of ICDE Conf. 2002.
Previous work: binary structural Join TreeMerge and Stack-Merge [1]:
Break into binary relationship branches Stitch the branches together
A novel stack-based binary join algorithm Disadvantage: large intermediate results
Appendix (1)
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 46
Previous work: TwigStackList v.s. TwigStack
TwigStack output the it output the “uesless” intermediate path solution < s1,t1>, since it doesn’t check for parent-child relationsihp. TwigStackList has no uesless intermediate output. < s1,t1> is not in the
output.
Twig Pattern
s1
p1
section
title paragraph
figure
p3
f1
t1
An XML tree
t2
s2
p2t3
f2
Root
s1
t1
No Parent-child relationship for branching node
Appendix (2)
top related