core labeling: a new way to compress transitive closure yangjun chen dept. applied computer science,...

41
Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Core Labeling: A New Way to Compress Transitive Closure

Yangjun Chen

Dept. Applied Computer Science,

University of Winnipeg

515 Portage Ave.

Winnipeg, Manitoba, Canada R3B 2E9

Page 2: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Outline

Motivation Tree labeling Main algorithm

- Core tree

- Graph labeling: Core-I- Graph labeling: Core-II

Conclusion

Page 3: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Motivation

Efficient method to evaluate sparse graph reachability queriesGiven a directed sparse graph G, check whether a node v is reachable from another node u through a path in G.

ApplicationXML data processing, gene-regulatory networks or metabolic networks. It is well known that XML documents are often represented by tree structure. However, an XML document may contain IDREF/ID references that turn itself into a directed, but sparse graph: a tree structure plus a few reference links. For a metabolic network, the graph reachability models a relationship whether two genes interact with each other or whether two proteins participate in a common pathway. Many such graphs are sparse.

Page 4: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

A simple method- store a transitive closure as a matrix

Motivation

cb

a

d e

G:

cb

a

d e

G*:

M =

abcde

a b c d e00000

10000

10100

00100

10000

M M =

abcde

a b c d e00000

10000

10100

10100

10000

O(n2) space

Page 5: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Tree labeling Tree encoding

Let G be a sparse graph. we will first find a spanning tree T of G.

Each node v in T will be assigned an interval [start, end), where start is v’s preorder number and end - 1 is the largest preorder number among all the nodes in T[v]. So another node u labeled [start’, end’) is a descendant of v (with respect to T) iff start’ [start, end).

i

[3, 4)

j [11, 12)

[9, 12)[5, 9)

k

d

r

[8, 9)

he

fc

b

a

[10, 11)

[6, 9)

[7, 8)[4, 5)[2, 4)

[1, 5)

[0, 12)

g

Let v and u be two nodes in T, labeled [a, b) and [a’, b’), respectively. If a [a’, b’),v is a descendant of u. In this case, we say, [a, b) is subsumed by [a’, b’). Also,we must have b b’. Therefore, if v and u are not on the same path in T, we haveeither a’ b or a b’. In the former case, we say, [a, b) is smaller than [a’, b’),denoted [a, b) [a’, b’). In the latter case, [a’, b’) is smaller than [a, b).

Page 6: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

[3, 4)

j [11, 12)

[9, 12)[5, 9)

k

d

r

[8, 9)

he

fc

b

a

[10, 11)

[6, 9)

[7, 8)[4, 5)[2, 4)

[1, 5)

[0, 12)

g

Tree labeling Tree encoding

ahefdkgcijbr

[0, 12)[2, 4)[4, 5)[6, 9)[9, 12)[2, 4)[4, 5)[6, 9)[3, 4)[4, 5)[7, 8)[3, 4)[4, 5)[3, 4)[2, 4)[8, 9)[2, 4)[10, 11)[11, 12)[1, 5)[2, 4)[5, 9)[6, 9)

Interval sequences:(label space)

i

Page 7: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Main Algorithm

Core tree (core of G)Let T be a spanning tree. We denote E’ the set of all the non-tree edges. Denote V’ the set of all the end points of the non-tree edges. Then, V’ =

Vstart Vend, where Vstart stands for a set containing all the start nodes of the non-

tree edges and Vend for all the end nodes of the non-tree edges.

Definition 1. (anti-subsuming subset) A subset S Vstart is called an anti-

subsuming set iff |S| > 1 and no two nodes in S are related by ancestor-descendant relationship with respect to T.

i jk

d

r

he

fc

b

a

g

Vstart = {d, f, g, h}

Vend = {c, k, e, d, g}{d, f}{d, g}{d, h}{f, g}{f, h}{g, h}

{d, f, g}{d, f, h}{d, g, h}{f, g, h}{d, f, g, h}

anti-subsumming subsets:

Page 8: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Main Algorithm

Core tree (core of G)Definition 2. (critical node) A node v in a spanning tree T of G is critical if

v Vstart or there exists an anti-subsuming subset S = {v1, v2, ..., vk} for k 2 such that v is

the lowest common ancestor of v1, v2, ..., vk. We denote Vcritical the set of all critical nodes.

In the graph, node e is the lowest common ancestor of {f, g}, and node a is the lowest common ancestor of {d, f, g, h}. So e and a are critical nodes. In addition, each v Vstart

is a critical node. So all the critical nodes of G with respect to T are {d, f, g, h, e, a}.

i jk

d

r

he

fc

b

a

g

Page 9: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Core tree (core of G)Definition 3. (core of G) Let G = (V, E) be a directed graph. Let T be a spanning tree of G. The core of G with respect to T is a tree structure with the node set being Vcritical and there is an edge from u to v (u, v Vcritical) iff there

is a path p from u to v in T and p contains no other critical nodes. The core of G with respect to T is denoted Gcore = (Vcore, Ecore).

e

hgfd

aGcore:

Main Algorithm

i jk

d

r

he

fc

b

a

g

ahefdg

[0, 12)[2, 4)[4, 5)[6, 9)[9, 12)[2, 4)[4, 5)[6, 9)[3, 4)[4, 5)[7, 8)[3, 4)[4, 5)[2, 4)[8, 9)

Page 10: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Core generationAlgorithm core-generation(T)

1. Mark any node in T, which belongs to Vstart.

2. Let v be the first marked node encountered during the bottom-up searching of T. Create the first node for v in Gcore.

3. Let u be the currently encountered node in T. Let u’ be a node in T, for which a node in Gcore is created just before u is met. Do (4) or (5),

depending on whether u is a marked node or not.

4. If u is a marked node, then do the following.

(a) If u’ is not a child (descendant) of u, create a link from u to u’, called a left-sibling link and denoted as left-sibling(u) = u’.

Main Algorithm

Page 11: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Core generationAlgorithm core-generation(T) (continued)

(b) If u’ is a child (descendant) of u, we will first create a link from u’ to u, called a parent link and denoted as parent(u’) = u. Then, we will go along a left-sibling chain starting from u’ until we meet a node u’’ which is not a child (descendant) of u. For each encountered node w except u’’, set parent(w) u. Set left-

sib ling(u) u’’. Remove left-sibling(w) for each child w of u.

5. If u is a non-marked node, then do the following.

(c) If u’ is not a child (descendant) of u, no node will be created.

(d) If u’ is a child (descendant) of u, we will go along a left-sibling chain starting from u’ until we meet a node u’’ which is not a child (descendant) of u. If the number of the nodes encountered during the chain navigation (not including

u’’) is more than 1, we will create new node in Gcore and do the same operation as

(4.b). Otherwise, no node is created.

Main Algorithm

Page 12: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Core tree (core of G)

Main Algorithm

… …

u’’

uu’’ is not a child of u.

u’

link to the left sibling

… …

u’’

u

u’

d d f d f g

(c)(b)(a)

ehgfd

a

(f)

d f g

e

d f g

e h(e)(d)

i jk

d

r

he

fc

b

a

g

Page 13: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Main Algorithm

Graph labeling: Core-IDefinition 4. Let Vcore = {v1, ..., vg} be the node set of Gcore. The core label for G is a set {L(v1), ..., L(vg)}, where each L(vl) (l = 1, ..., g) is an interval sequence associated with vl, satisfying the following two properties:

(1) Let L(vl) = [al1, bl1), ..., [alr, blr) for some r. Then, for any i, j {1, ..., r}, ali blj if i < j. That is, [ali, bli) ≺ [alj, blj) for i < j. (In this sense, the

intervals in L(vl) are considered to be sorted.)

(2) Let [a, b) be the interval associated with a descendant of vl with respect to G. There exists an interval [ali, bli) (1 i r) in L(vl) such that a [ali, bli).

Definition 5. (link graph) Let G = (V, E) be a directed graph. Let T be a spanning tree of G. The link graph of G with respect to T is a graph, denoted Glink, with the node set

being V’ (the end points of all the non-tree edges) and the edge set E’ E’’, where (v, u) E’’ iff v Vend, u Vstart, and there exists a path from v to u in T.

Page 14: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Main Algorithm

Graph labeling: Core-I

k

d

he

fc g

k

d

he

fcg

a[0, 12)

[6, 9) [9, 12)

[8, 9)[7, 8)[4, 5)

[3, 4)

[2, 4)

Glink:

Gcom = Gcore Glink: ahefdkgc

[0, 12)[2, 4)[4, 5)[6, 9)[9, 12)[2, 4)[4, 5)[6, 9)[3, 4)[4, 5)[7, 8)[3, 4)[4, 5)[3, 4)[2, 4)[8, 9)[2, 4)

reverse topological order

Page 15: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

- Generation of interval sequences

1. Scan the reverse topological order of Gcom.

2. For each node v, the interval sequence L(v) is stored in a linked list Av. Initially, Av contains only one interval, which is generated by labeling T.

3. Let v1, ..., vk be the children of v (in Gcom). Merge Av with each Avl for the child node vl (l = 1, ..., k) as follows. Assume

Av = p1 p2 ... pg and Avl = q1 q2 ... qh.

Assume that both Av and Avl are increasingly ordered. (As we will see soon, any interval sequence generated by the following algorithm has this nice property. It contains only the intervals not on the same path in T. Initially, Av contains only one interval. It is considered to be sorted.)

Main Algorithm

Page 16: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

- Generation of interval sequences

4. We step through both Av and Avl from left to right. Let pi = [ai, bi) and qj = [aj, bj) be the intervals encountered. We will conduct the following checkings.

(i) If ai bj, insert qj into Av after pi-1 and before pi and move to qj+1.

(ii) If ai [aj, bj), remove pi from Av and move to pi+1. (*pi is subsumed by qj.*)

(iii) If aj [ai, bi), ignore qj and move to qj+1. (*qj is subsumed by pi; but it should not be removed from Avl.*)

(iv) If aj bi, ignore pi and move to pi+1.

(v) If ai = aj and bi = bj, ignore both pi and qj, and move to pi

Main Algorithm

Page 17: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

- Generation of interval sequences

Example.

Main Algorithm

A1: [2, 4)[4, 5)[7, 8)A2: [2, 4)[8, 9)

p

q

A1: [4, 5)[7, 8)A2: [2, 4)[8, 9)

p

q

A1: [3, 4)[4, 5)[7, 8)A2: [2, 4)[8, 9)

p

q

A1: [2, 4)[4, 5)[7, 8)A2: [2, 4)[8, 9)

p

q

A1: [2, 4)[4, 5)[7, 8)[8, 9)A2: [2, 4)[8, 9)

P = nil

q

A

Page 18: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

- Core labels

Main Algorithm

e

hgfd

a [0, 12)

[2, 4)[4, 5)[6, 9)

[2, 4)[4, 5)[6, 9)[9, 12)[3, 4)[4, 5)

[3, 4)[4, 5)[7, 8) [2, 4)[8, 9)

Page 19: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

- Non-tree labelingLet Vcore = {v1, ..., vj}. We store the core label of G as a list: s1 = L(v1),

..., sj = L(vj). Then, we define a function : Vcore {1, ..., j} such that

for each v Vcore (v) = i iff si = L(v). Based on the above concepts,

we define Core-I below.

Main Algorithm

s1: L(a)

s2: L(h)

s3: L(e)

s4: L(f)

s5: L(d)

s6: L(g)

= [0, 12)= [2, 4)[4, 5)[6, 9)[9, 12)= [2, 4)[4, 5)[6, 9)= [3, 4)[4, 5)[7, 8)= [3, 4)[4, 5)= [2, 4)[8, 9)

(a) (h) (e) (f) (d) (g)

= 1= 2= 3= 4= 5= 6

Page 20: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

- Non-tree labelingEach node v in V is associated with two nodes: v- and v*.

v- - a critical node in T[v], which is closest to v.

v* - the lowest ancestor of v (in T), which has a non-tree incoming edge.

Example.

Main Algorithm

ij

k

d

rh

e

fc

b

a

g

r- = e, r* does not exist.

e- = e, e* = e.

Page 21: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

- Non-tree labelingDefinition (Core-I) Let v be a node in G. The non-tree label of v is a pair <, >, where - = i if v- exists and (v-) = i. If v- does not exists, let be the special

symbol “-”.- = [x, y) if v* exists and [x, y) is the interval of v*. If v* does not

exist, let y be “-”.

Main Algorithm

i

[3, 4)<_,[3, 4)>

j [11, 12)<_, _>

[9, 12)<2, _>

[5, 9)<3, _>

k

d

r

[8, 9)<6, [8, 9)>

he

fc

b

a

[10, 11)<_, _>

[6, 9)<3, [6, 9)>

[7, 8)<4, [6, 9)>

[4, 5)<5, [4, 5)>

[2, 4)<_,[2, 4)>

[1, 5)<5, _>

[0, 12)<1, _>

g

Page 22: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

- Non-tree labelingProposition Assume that u and v are two nodes in G, labeled ([a1, b1), <x1, y1>) and ([a2, b2), <x2, y2>), respectively. Node v is reachable from u iff one of the following conditions holds:

(i) [a2, b2) is subsumed by [a1, b1), or

(ii) There exists an interval [a, b) in sx1 such that for y2 = [a’, b’) we

have a’ [a, b) (i.e., y2 is subsumed by [a, b) .)

Main Algorithm

Page 23: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Main Algorithm

Graph labeling: Core-IIWe can store the core label of G as a d g boolean matrix M, where d is the number of the end nodes of all non-tree edges and g the number of the nodes in Gcore.

Let u1, u2, ..., ud be all the end nodes of the non-tree edges. Let v1, v2, ..., vg be all the nodes in Gcore. Assign each ui an index, denoted index(ui) (i.e., u1, u2, ..., ud will be assigned contiguous integers, starting from 0.) Assign each vj an index, denoted index’(vj). An entry M[index(ui), index’(vj)] is set to 1 if there exists an interval [a’, b’) in L(vj) such that for ui’s interval [a, b) we have a [a’, b’); otherwise, it is set to 0.

011111

111111

211111

301100

401100

510001

01231

index(c) = 0index(k) = 1index(d) = 2index(e) = 3index(g) = 4

Index’(a) = 0Index’(h) = 1Index’(e) = 2Index’(f) = 3Index’(d) = 4Index’(g) = 5

Page 24: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Conclusion

A new algorithm for graph recheabiliy - Core tree- Graph labeling: Core-I

query time: O(log(min{b, s}))

labeling time: O(n + e + t · min{b, s})space overhead: O(n + s · min{b, s} )

- Graph labeling: Core-IIquery time: O(1)

labeling time: O(n + e + t · min{b, s} + d·s log(min{b, s})space overhead: O(n + d · s)

Page 25: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Evaluation of Twig Pattern Queries Based on Ordered Tree matching

Yangjun Chen

Dept. Applied Computer Science,

University of Winnipeg

515 Portage Ave.

Winnipeg, Manitoba, Canada R3B 2E9

Page 26: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Outline

Motivation Algorithm for tree pattern query

evaluation based on ordered tree matching

- Tree encoding - Algorithm description Index-based algorithm Conclusion

Page 27: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Motivation

XPath evaluation against XML documents- XPath expression

a[b[c and .//d]]/b[c and e//d]

book[title = ‘Art of Programming’]//author[fn = ‘Donald’ and

ln = ‘Knuth’]

a

b b

c d c e

d

title

Art of Programming

book

author

fn ln

KnuthDonald

<document><book>

<title>Art of Programming

</title><author>

<fn>Donald Knuth</fn>… …

Page 28: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Motivation

XPath evaluation against XML documentsEvaluation based on unordered tree matchingXPath expression:Definition An embedding of a twig pattern Q into an XML document T is a mapping f: Q T, from the nodes of Q to the nodes of T, which satisfies the following conditions:

(i) Preserve node label: For each u Q, label(u) matches label(f(u)).

(ii) Preserve parent-child/ancestor-descendant relationships: If u v in Q, then f(v) is a child of f(u) in T; if u v in Q, then f(v) is a descendant of f(u) in T.

db

a

f g

cb

a

c e

Q: T:

Page 29: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Motivation

XPath evaluation against XML documents- Evaluation based on ordered tree matching

XPath expression:a[b[c/following-sibling::

.//d]]/following-sibling::b[c/following- sibling:: e//d]

Page 30: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Motivation

XPath evaluation against XML documents- Evaluation based on ordered tree matchingDefinition An embedding of a twig pattern Q into an XML document T is a mapping f: Q T, from the nodes of Q to the nodes of T, which satisfies the following conditions:

(i) Preserve node label: For each u Q, label(u) matches label(f(u)).

(ii) Preserve parent-child/ancestor-descendant relationships: If u v in Q, then f(v) is a child of f(u) in T; if u v in Q, then f(v) is a descendant of f(u) in T.

(iii) Preserve sibling order: For any two nodes v1 Q and v2 Q, if v1 is to the left of v2, then f(v1) is to the left of f(v2) in T.

a

b c

Q: q3

q1 q2

a

c c

c b

b

T:

v1

v2

v3

v4 v5

v6

Page 31: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Algorithm for tree pattern query evaluation Tree encoding

Let T be a document tree. We associate each node v in T with a quadruple (DocId, LeftPos, RightPos, LevelNum), denoted as (v), where DocId is the document identifier; LeftPos and RightPos are generated by counting word numbers from the beginning of the document until the start and end of the element, respectively; and LevelNum is the nesting depth of the element in the document.

(i) ancestor-descendant: a node v1 associated with (d1, l1, r1, ln1) is an ancestor of another node v2 with (d2, l2, r2, ln2) iff d1 = d2, l1 < l2, and r1 > r2.

(ii) parent-child: a node v1 associated with (d1, l1, r1, ln1) is the parent of another node v2 with (d2, l2, r2, ln2) iff d1 = d2, l1 < l2, r1 > r2, and ln2 = ln1 + 1.

(iii)from left to right: a node v1 associated with (d1, l1, r1, ln1) is to the left of another node v2 with (d2, l2, r2, ln2) iff d1 = d2, r1 < l2.

Page 32: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Algorithm for tree pattern query evaluation

Tree encodingExample.

a

c c

c b

b

T:

v1

v2

v3

v4 v5

v6(1, 1, 9, 1)

(1, 8, 8, 2)(1, 2, 7, 2)

(1, 4, 6, 3)(1, 3, 3, 3)

(1, 5, 5, 4)

Page 33: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Algorithm for tree pattern query evaluation

Main algorithm1. First, we will number both T and Q in postorder. So the nodes in both trees will be referenced by their postorder numbers.

2. We will access the nodes in T and the nodes in Q along their postorder numbers.

Each time we meet a node i in Q, we will associate it with an array, Ai, of length

|T|, indexed from 0 to |T| - 1. Ai’s are manipulated as follows.

a

b c

Q: q3

q1 q2

a

c c

c b

b

T:

v1

v2

v3

v4 v5

v6

1 2

3

12

3

4 5

6

Page 34: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Algorithm for tree pattern query evaluation

(i) We set a virtual node for T, numbered 0, which is considered to be to the left of any node in T.

(ii) If we find Q[i] can be embedded in T[j], we will set Ai[j1], ..., Ai[jk]

(0 k j - 1) to j, where each jl (0 l k) is a node to the left of j, to record the fact that j is the closest node to the right of jl such that T[j]

embeds Q[i].

a

c c

c b

b

T:

v1

v2

v3

v4 v5

v6

a

b c

Q: q3

q1 q2

1 2

3

12

3

4 5

6v0

1

0 1 2 3 4 5

A2: 1 5 5 5 5

0 1 2 3 4 5

A2:

Page 35: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Algorithm for tree pattern query evaluation

(iii) If some time later we find another node p such that Q[i] can be embedded in T[p], we will set Ai[p1], ..., Ai[pq] to p, where each ps (1 s q) is to the left of p but to the right of jk.

(iv) For all the other nodes j’ such that T[j’] embeds Q[i], we will set values for the entries in Ai in the same way as (ii) and (iii).

3. During the process, when we meet i in Q and j in T, we will do the following:

Let i1, ..., ik be the child nodes of i in Q. We first check starting from Ai1[l], wherel = min{desc(j)} - 1 and desc(j) represents all the descendants of j. We begin thesearching from min{desc(j)} - 1 because it is the closest node to the left of a

descendant of j, which has the least postorder number. Let Ai1[l] = j’. If (i, i1) is /-edge, we will check whether (j, j’) is a /-edge. Otherwise, we only check whether

j’ is descendant of j. If it is not the case, we will check Ai1[j’]. This process continues until one of the following conditions is satisfied:

(i) Ai1 is exhausted (we cannot find a descendant j’’ of j such that T[j’’] contains Q[i1]; or(ii) we find an j’’ satisfying the parent-child or ancestor-descendant relationship, depending on whether (i, i1) is a /-edge or a //-edge. Then, we will check Ai2[j’’].

Page 36: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Algorithm for tree pattern query evaluation

•If Ai1[l], is exhausted (case (i)), it shows that Q[i1] cannot be embedded in any subtree rooted at a child node (for /-edge) or a descendant (for //-edge) of j. It indicates that Q[i1] cannot be embedded into T[j] and thus T[j] cannot embed Q[i]. We will continue to check i against a next node in T.

•If it is case (ii), we will check Ai2, starting from [j’’]. For all the other Ail’s (l = 3, ..., k), we will do the same checkings. If for each il (l = 1, ..., k) we can find j’ such that T[j’] embeds Q[il ], it shows that T[j] embeds Q[i] and we will set some new values in Ai as described in (2).

Q:i

i1 ik… …

T:j

j’’

i2

l

l

.. .. .. j’ .. .. Ai1:

j’

.. .. .. j’’.. .. Ai2:

j’

Page 37: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Algorithm for tree pattern query evaluationExample.

a

c c

c b

b

T:

v1

v2

v3

v4 v5

v6

a

b c

Q: q3

q1 q2

1 2

3

12

3

4 5

6v0

6

0 1 2 3 4 5

A3:1

0 1 2 3 4 5

A2: 2 2

0 1 2 3 4 5

A1:

(a) (b)

(f)

1

0 1 2 3 4 5

A2: 2 2

0 1 2 3 4 5

A1:

(d) (c)

1 5 5 5 5

0 1 2 3 4 5

A2: (e)The time complexity of thealgorithm is O(|T||Q|).

Page 38: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Index-base algorithm

XB-treeAn XB-tree is a variant of B+-tree over a quadruple sequences.

a

c c

c b

b

T:

v1

v2

v3

v4 v5

v6(1, 1, 9, 1)

(1, 8, 8, 2)(1, 2, 7, 2)

(1, 4, 6, 3)(1, 3, 3, 3)

(1, 5, 5, 4)

(1, 3, 3, 3)(1, 5, 5, 4)(1, 4, 6, 3)(1, 2, 7, 2)(1, 8, 8, 2)(1, 1, 9, 1)

sorted by RightPosvalues

3, 3 5, 5

c b

P2: 4, 6 2, 7

b c

P3: 8, 8 1, 9

c a

P4:

3, 5 2, 7 1, 9P1:P.parent

P.parentIndex

Page 39: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Index-base algorithm

Searching an XB-tree- = (P, i) – indicates that the ith entry in the page P is currently accessed.- advance() (going up from a page to its parent): If (P, i) does not point to the last entry of P, i i + 1. Otherwise, (P.parent, P.parentIndex).- drilldown() (going down from a page to one of its children): If (P, i) and P is not a leaf page, (P’, 1), where P’ is the ith child page of P.- Initially, (rootPage, 1), pointing to the first entry in the root

page. We finish a traversal of the XB-tree when (rootPage, last), where last points to the last entry in the root page, and we advance it (in this case, we set to nil).

Page 40: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Index-base algorithm

Searching an XB-tree- Assume that i in Q is the node currently encountered. We will find, by

searching the XB-tree, a node j of T with label(i) = label(j), for which it is possible that T[j] embeds Q[i].

- L(i) - the most recently found node such that Q[i] can be embedded into T[L(i)].

Procedure search(XB, i)1. Let i1, ..., ik be the children of i. Assume that L(ik) = v. l v.LeftPos. r v.RightPos. If

i is a leaf node, then l , r 0.2. Assume that = (P, c). Let j be the entry pointed to by . We will do the following

checkings.i) If P is a leaf page, label(j) = label(i) and j.LeftPos < l and j.RightPos > r, then

advance(), return j.i) If P is an internal page, and j.LeftPos < l and j.RightPos > r, drilldown().iii) If j.RightPos < r, then advance(). If = nil, return nil.

3. Repeat (2) until the whole XB-tree is traversed (i.e., when = nil) or a node j is found (i.e., the condition in (2)-(i) is satisfied).

Page 41: Core Labeling: A New Way to Compress Transitive Closure Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg,

Conclusion

Algorithm for evaluating tree patternqueries based on ordered tree matching

- time complexity: O(|T||Q|).- Space complexity: O(|T||Q|).- The algorithm can be integrated into an

index environment by using XB-trees.