structural indexes of xml databases

53
Master Informatique 1 Dr. Vu Le Anh Structural indexes of XML Databases Structural indexes of XML Databases Dr. Vu Le Anh [email protected]

Upload: errin

Post on 05-Feb-2016

59 views

Category:

Documents


0 download

DESCRIPTION

Structural indexes of XML Databases. D r. Vu Le Anh [email protected]. Outline. Mo tiviation Regular queries processing over XML datasets Indexes over XML datasets Stru ctural indexes Stru ctural indexe s for distributed XML d atasets Summary. NCBI GEO dataset. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Structural indexes of XML Databases

Master Informatique 1Dr. Vu Le Anh Structural indexes of XML Databases

Structural indexes of XML Databases

Dr. Vu Le Anh [email protected]

Page 2: Structural indexes of XML Databases

Master Informatique 2Dr. Vu Le Anh Structural indexes of XML Databases

Outline

1. Motiviation2. Regular queries processing over XML datasets3. Indexes over XML datasets4. Structural indexes5. Structural indexes for distributed XML

datasets6. Summary

Page 3: Structural indexes of XML Databases

Master Informatique 3Dr. Vu Le Anh Structural indexes of XML Databases

NCBI GEO dataset• GEO is a public functional genomics data repository supporting MIAME-compliant data submissions.

• About 600 gigabyte (Feb - 2009). Data are stored in XML datasets

A map of gene is written in XML file, and its XML graph.

Page 4: Structural indexes of XML Databases

Master Informatique 4Dr. Vu Le Anh Structural indexes of XML Databases

Virtual observatory

•  A collection of interoperating data archives and software tools which utilize the internet to form a scientific research environment in which astronomical research programs can be conducted.

•  IVOA (International Virtual Observatory Alliance) Building an international community

• Using very big XML datasets for storing, exchanging data

Page 5: Structural indexes of XML Databases

Master Informatique 5Dr. Vu Le Anh Structural indexes of XML Databases

Problem

• Efficient query processing over Big (Distributed) XML - Databases

• Two “interesting” ideas:1. Storing the XML database in relational

database. Rewriting XML a az XML queries SQL and Datalog. Rewriting and combining the results.

2. Indexing the XML database. Using the indexes for query processing.

Page 6: Structural indexes of XML Databases

Master Informatique 6Dr. Vu Le Anh Structural indexes of XML Databases

Data Graph – Data Model for XML

• Data graph: directed, rooted, labelled graph.

: set of nodes. : set of label values

: set of edges

: set of basic edges.

: set of reference edges.

: the root.

: labeling function

),,,,( labelrEVG VVVE

fb EEE bE

fE

VrVlabel :

Page 7: Structural indexes of XML Databases

Master Informatique 7Dr. Vu Le Anh Structural indexes of XML Databases

Publication XML document<CSDepartment> <PhDStudents> <Student id="s1"> <Name>John</Name>

<Papers> <Paper id="pp1" > <Title>ABC</Title> <Author>Dr.Ben</Author> <Author idref="p1" > </Author> </Paper> </Papers> </Student> <Student id="s2"> <Name>Tom</Name> </Student> </PhDStudents> <Professors> <Professor id="p1" > …

… <Name>Dr. Kiss</Name> <Papers> <Paper idref="pp1" > </Paper> <Paper> <Title>DEF</Title> </Paper> </Papers> </Professor> <Professor id="p2"> <Name>Dr. Baker</Name> <Papers> <Paper> <Title>XYZ</Title> </Paper> </Papers> </Professor> </Professors></CSDepartment>

Page 8: Structural indexes of XML Databases

Master Informatique 8Dr. Vu Le Anh Structural indexes of XML Databases

XML - Datagraph

Page 9: Structural indexes of XML Databases

Master Informatique 9Dr. Vu Le Anh Structural indexes of XML Databases

Regular queries• Query language for XML:

– XQuery, XPath, UnQL, Lorel, XQL, XML-QL, etc.

• Build around regular expressions.• 3 basic operations:

– Concatation: . or /– Union: | – Interation: *

• For short: _ - some label value// - (_)* some sequence of label values

• Example: //(Student | Professor)//Paper/Title

Page 10: Structural indexes of XML Databases

Master Informatique 10Dr. Vu Le Anh Structural indexes of XML Databases

Regular queries• Pair of nodes (u, v) matches R regular query, if

there is a rout from u to v, in which the label sequence of the rout matching R.

• The result of R :

I the input-set and O the output-set

, (u, v) matches R}• General case: I={root} és O={V}.• Every R regular expression can be represented

by a finite, not determined automata (NFA), which computes L(R) language. Query graph is the graph representing the automata.

|),{(),( OIvuRGR IO

Page 11: Structural indexes of XML Databases

Master Informatique 11Dr. Vu Le Anh Structural indexes of XML Databases

Query processing based on the automata

• The query graph of //B/D:• Input: I={0}; Output: O={0,1,…,15}

A

A B

0

1 8

CB2 6 A D9 13

AD B E3 107 14

DCA F4 5 1211 E 15

* B D

q0 q1 q2

q0

q0 q0

q0

q0

q0 q2 q0 q2

q1

q0

The result = {(0,3),(0,11),(0,13)}

Page 12: Structural indexes of XML Databases

Master Informatique 12Dr. Vu Le Anh Structural indexes of XML Databases

Transform to Edge Labeled graphNode labeled graph Edge labeled graph

Query graph is a edge labeled graph.Transform data graph to edge labeled graph.

Page 13: Structural indexes of XML Databases

Master Informatique 13Dr. Vu Le Anh Structural indexes of XML Databases

State-Data (SD) graph

• SD graph = Query graph JOINING Data graph

• SD graph may be not connective.

• SD-Nodes: (data-node, state-node)

• SD- labeled edges: Constructing from the matching of labels of data-edges and node-edges.

Page 14: Structural indexes of XML Databases

Master Informatique 14Dr. Vu Le Anh Structural indexes of XML Databases

Joining R:= a/(b|c)*/a and data graph

s0 s1 s2

a

b

c

a

Query graph: Data graph:

5

4

3

2

1

ac

a

a

b

SD-graph:

1,s0

2,s02,s1

1,s1

2,s2

a

b

3,s1

c

4,s2

a

5,s2

a5,s1

a

a

3,s0

4,s1

Result: (1,4) , (1,5)

a

Page 15: Structural indexes of XML Databases

Master Informatique 15Dr. Vu Le Anh Structural indexes of XML Databases

SD-graph representation on relational database [KissVu05]

• Main results: – The data graph and query graph can be

represented by tables – SD graph (table) = Joining data table and

query table.– Computing the result based on the SD-table.– Regular query processing DATALOG +

SQL– Building the index to support SQL

computation.

Page 16: Structural indexes of XML Databases

Master Informatique 16Dr. Vu Le Anh Structural indexes of XML Databases

1. Step: Transform data graph to edge labeled graph

Page 17: Structural indexes of XML Databases

Master Informatique 17Dr. Vu Le Anh Structural indexes of XML Databases

2. step: Query graph representation

Page 18: Structural indexes of XML Databases

Master Informatique 18Dr. Vu Le Anh Structural indexes of XML Databases

3. lépés: Using DATALOG, SQL for the computation

Page 19: Structural indexes of XML Databases

Master Informatique 19Dr. Vu Le Anh Structural indexes of XML Databases

4. step: Computation in Relational Databases

results: {4,5,6}

Page 20: Structural indexes of XML Databases

Master Informatique 20Dr. Vu Le Anh Structural indexes of XML Databases

Classes of XML indexes1. Indexing the basic values

– The basis values are indexing (Ex: data(//emp/salary))– Using B+-tree

2. Indexing the text values– Keywords should be indexed

3. Indexes for XML -Tree – Quickly checking and computing the label sequence of

rout between some pair of nodes.– Applying it for near-tree XML datasets.

4. Structural indexes.– Simulating the datagraph by smaller one to reduce the

cost of computation

Page 21: Structural indexes of XML Databases

Master Informatique 21Dr. Vu Le Anh Structural indexes of XML Databases

XML-tree pre/post computing [Dietz82]

• Tree preorder/postorder walking for computing (pre(x),post(x))

(1,7)

(2,4)

(3,1) (4,2) (5,3)

(6,6)

(7,5)

x is a descendent of y <=>pre(x) < pre(y) és

post(x) > post(y)

Page 22: Structural indexes of XML Databases

Master Informatique 22Dr. Vu Le Anh Structural indexes of XML Databases

Tree- Structure Improvement [Li&Moon VLDB 2001]

• Every x node: (order(x), size(x))

(1,100)

(10,30)

(11,5) (17,5)(25,5)

(41,10)

(45,5)

x is a descendent of y <=>order(x) < order(y) és

order(y) <= order(y) + size(x)

Page 23: Structural indexes of XML Databases

Master Informatique 23Dr. Vu Le Anh Structural indexes of XML Databases

Regular query processing over XML –tree and near tree

• Very efficient based on tree-structured indexes

• [KissVu06]: Applying for near-tree XML dataset

• Link graph: Connecting between link nodes.

• Using tree-structured indexes for the basic structure

Page 24: Structural indexes of XML Databases

Master Informatique 24Dr. Vu Le Anh Structural indexes of XML Databases

Family of Structural indexes

Page 25: Structural indexes of XML Databases

Master Informatique 25Dr. Vu Le Anh Structural indexes of XML Databases

1-index [Milo & Suciu, LNCS 1997]

Idea: Grouping all “equivalent” data-nodes into an index-node. Computing the index nodes bi-simulation equivalent ≡ ekvivalencia helyett.

• Index graph is smaller than the data-graph

• Working for every regular queries.

• A bi-simulation computing = PTIME.

Page 26: Structural indexes of XML Databases

Master Informatique 26Dr. Vu Le Anh Structural indexes of XML Databases

Bisimulation• A bi-simulation:

– x1 és x2 have the same label

– If x1 x2 and (y1,x1) is an edge, then there

exists edge (y2,x2), in which y1 y2.

y1 y2

a

x1 a x2

b b

Page 27: Structural indexes of XML Databases

Master Informatique 27Dr. Vu Le Anh Structural indexes of XML Databases

Example 1-index

1

paper

2,4,8,13section

3,5,9,14

title

6,10algorithm

7proof 11

proof12

uses

15,16

17,18about

exp

1-index

1

paper

4 section

5 title 6

algorithm

7proof

8section

9title

10

11proof

12

uses

algorithm

13 section14

15

16

17

18

about

about

title2

section

3

title

expexp

Data Graph

/paper/section/algorithm

Page 28: Structural indexes of XML Databases

Master Informatique 28Dr. Vu Le Anh Structural indexes of XML Databases

Using 1-index?

• Good: Working for all regular queries.• Bad: Not small enough !!!• Idea: The index graph is designed only for

the most frequently in use queries. The index graph is very small now !!!

• New equivalent relationship between nodes should be defined

• If the query is not support, re-check on the data graph

Page 29: Structural indexes of XML Databases

Master Informatique 29Dr. Vu Le Anh Structural indexes of XML Databases

Structural indexes and a given set of queries

• Important : – //a0/a1/…/ai (i<=k), not longer than k

• A(k)-index

– Dinamikus indexek• APEX, D(k)-index

– //S0/S1/…/Sk, SAPE queries• DL-1, DL-A*(k)-index

– Forward-backward queries• F&B-index

Page 30: Structural indexes of XML Databases

Master Informatique 30Dr. Vu Le Anh Structural indexes of XML Databases

A(k)-Index [Kaushik et al. 02]

• A //a0/a1/…/ai (i<=k) • A k-biszimulation.

• A k (k-biszimuláció):– u 0 v, ha u és v if they have same label,– u k v if u k-1 v and

• If (u’,u) is an edge, there exists edge (v’,v): u’ k-1 v’• If (v’,v) is an edge, there exists edge (u’,u): u’ k-1 v’

Page 31: Structural indexes of XML Databases

Master Informatique 31Dr. Vu Le Anh Structural indexes of XML Databases

A(k)-index

imdb

movie

director

name

tv

director

name

{1}

{2}

{3}

{4}

{5}

{6,8}

{7,9}

A(2)-index (1-index)

1

2

3

4

5

6

7

8

9

imdb

movie

director

name

tv

director

name

director

name

Data graph

imdb

movie tv

director

name

{1}

{2} {5}

{3,6,8}

{4,7,9}

A(0)-index

imdb

movie

director

tv

director

name

{1}

{2}

{3}

{5}

{6,8}

{4,7,9}

A(1)-index

Page 32: Structural indexes of XML Databases

Master Informatique 32Dr. Vu Le Anh Structural indexes of XML Databases

Split Operation

R

A B

C3

C6

C1 C2

C4 C5

R

A B

C2,C3C1

C4 C5,C6

R

A B

C2,C3C1

C4,C5,C6

R

A B

C1,C2,C3C4,C5,C6

Adatgráf A(2) (=1-index) A(1) A(0)

Page 33: Structural indexes of XML Databases

Master Informatique 33Dr. Vu Le Anh Structural indexes of XML Databases

Refinement (1. step)

R

A B

C3

C6

C1 C2

C4 C5

R

A B

C2,C3C1

C4 C5,C6

R

A B

C2,C3C1

C4,C5,C6

R

A B

C1,C2,C3C4,C5,C6

Data gráph A(2) (=1-index) A(1) A(0)

Page 34: Structural indexes of XML Databases

Master Informatique 34Dr. Vu Le Anh Structural indexes of XML Databases

Refinement (2. step)

R

A B

C3

C6

C1 C2

C4 C5

R

A B

C2,C3C1

C4 C5,C6

R

A B

C2,C3C1

C4,C5,C6

R

A B

C1,C2,C3C4,C5,C6

Data graph A(2) (=1-index) A(1) A(0)

Page 35: Structural indexes of XML Databases

Master Informatique 35Dr. Vu Le Anh Structural indexes of XML Databases

DL-1-index [KissVu06]

• //S0/S1/…/Sk (SAPE = Simple Alternation Path Expression).

• Dinamikus index (Dynamic labelling).

Page 36: Structural indexes of XML Databases

Master Informatique 36Dr. Vu Le Anh Structural indexes of XML Databases

A //(d|e)/f SAPE query

0

1 2

64 5

3

7 8

9 10 11 12 13

a

bb

d

c

de

f

e

f f f

d

g

Data GraphA SAPE query: //(d|e)/fR := S0/S1

S0= { d,e } ; S1= { f }

A (4,9), (5,10), (6,11) és (7,12) matching R.

The result:TG(R) = {9,10,11,12}

Page 37: Structural indexes of XML Databases

Master Informatique 37Dr. Vu Le Anh Structural indexes of XML Databases

Example: DL 1-index support //(K|L) és //(B|C)/E queries

0

1 2 3 4

5 6 7 8

9 10 11 12

A

B

EE

C

F

C D

E

ML NK

The data graph and the 1-index are the same.

0 A

1,2,3,4K,L,M,N

5,6,7,8B,C,D

9,10,11,12E,F

DL-1-index at the begin.

0 A

1,2K,L

3,4M,N

5,6B,C

7,8C,D

9,10E

11,12E,F

0 A

1,2K,L

5,6B,C

9,10E

3,4M,N

7 8

11 12

C

F

D

E(a) (b) (c) (d)

R1= //(K|L) support R2= //(B|C)/ESupport

Page 38: Structural indexes of XML Databases

Master Informatique 38Dr. Vu Le Anh Structural indexes of XML Databases

A DL-A*(k)-index [KissVu06]

1. The A(i)-index is a special case of DL-A*(k).

2. DL-A*(k)-index support for a given not longer k SAPE queries.

Page 39: Structural indexes of XML Databases

Master Informatique 39Dr. Vu Le Anh Structural indexes of XML Databases

DL-A*(1)-index support A //(K|L) and //(B|C)/E queries

0

1 2 3 4

5 6 7 8

9 10 11 12

A

B

EE

C

F

C

MLK

D

E

N

Data graph

the begin index:

//(K|L) - refinement:

//(B|C)/E -refinement:

Page 40: Structural indexes of XML Databases

Master Informatique 40Dr. Vu Le Anh Structural indexes of XML Databases

Experiments

1. DL-1 vs. 1-index

2. DL-A*(k) vs. A(k)-index

• 2 datasets:

- XMark: 100 Mb, 1.681.342 nodes.

- TreeBank: 82Mb, 2.437.667 nodes.

Page 41: Structural indexes of XML Databases

Master Informatique 41Dr. Vu Le Anh Structural indexes of XML Databases

Page 42: Structural indexes of XML Databases

Master Informatique 42Dr. Vu Le Anh Structural indexes of XML Databases

Distributed XML-tree

• XML- tree = Fragments – sub trees.

• Servers stores some fragments.

• There are linking edges between fragments.

• Questions: Finding efficient protocol for regular query processing? Waiting time – Computing time

• Applying structural indexes?

Page 43: Structural indexes of XML Databases

Master Informatique 43Dr. Vu Le Anh Structural indexes of XML Databases

//a/b//a processing on XML –tree using 2 servers

Page 44: Structural indexes of XML Databases

Master Informatique 44Dr. Vu Le Anh Structural indexes of XML Databases

Flow modell (SPIDER algoritmus)

• Beginning from the root.• (F, q) (F’, q’):1. Processing on F stops.2. Processing on F’ with state q’. 3. If finish processing over F’, then send the

result to F.4. F continues

Waiting time!

Page 45: Structural indexes of XML Databases

Master Informatique 45Dr. Vu Le Anh Structural indexes of XML Databases

2 phases parallel modell

• Servers: Computing every possible states on it own site.

• Sending to the coordinator the link edge

• Coordinator examines the link edges and request the results from servers

• Severs send the results to coordinator.

• The computing time !!!

Page 46: Structural indexes of XML Databases

Master Informatique 46Dr. Vu Le Anh Structural indexes of XML Databases

1- phase parallel model [KissVu07]

• The coordinator builds the structural Tree-index for whole system for determine connective (F,q) states.

• Processing on the index first for computing connective states

Good: Efficient processing

Bad: The index may be big.

Page 47: Structural indexes of XML Databases

Master Informatique 47Dr. Vu Le Anh Structural indexes of XML Databases

Structural Tree-index

A F00 F3

1

2

A B8

F2 F4F13

4 5

10

6

12

14

13

1511A C

D

CB

F

E

D

D

B

A

A

E

7

F5 Fa-index

A F0

A F2 BF3

B F4

DF1

DF5

ε

ABAC

A

εq0

q0 q1

(F2,q1), (F2,q2): is not connective

q0

q0

q0q0 q1

Connective states:(F0,q0), (F1,q0), …

Page 48: Structural indexes of XML Databases

Master Informatique 48Dr. Vu Le Anh Structural indexes of XML Databases

Experiments

• 19 Linux local-servers.

• Waiting time:

1IP : 2P : SP = 1 : 1.94 : 37.52

• Computing time:

1IP : 2P : SP = 1 : 1.77 : 2.75

Page 49: Structural indexes of XML Databases

Master Informatique 49Dr. Vu Le Anh Structural indexes of XML Databases

Native XML database systems http://www.rpbourret.com/xml/XMLDatabaseProds.htm#native

Termék Fejlesztő License AdatbázistípusQizx/db XMLMind Commercial ProprietarySedna XML DBMS ISP RAS MODIS Free ProprietarySekaiju / Yggdrasill Media Fusion Commercial ProprietarySQL/XML-IMDB QuiLogic Commercial Proprietary (native XML and relational)Sonic XML Server Sonic Software Commercial Object-oriented (ObjectStore).Tamino Software AG Commercial Proprietary. Relational through ODBC.TeraText DBS TeraText Solutions Commercial ProprietaryTEXTML Server IXIASOFT, Inc.Commercial ProprietaryTigerLogic XDMS Raining Data Commercial PickTimber University of Michigan Open Source (non-commercial only) Shore, Berkeley DBTOTAL XML Cincom Commercial Object-relationalVirtuoso OpenLink Software Commercial Proprietary. Relational through ODBCXDBM Matthew Parry, Paul Sokolovsky Open Source ProprietaryXDB ZVON.org Open Source Relational (PostgreSQL)XediX TeraSolution AM2 Systems Commercial ProprietaryX-Hive/DB X -Hive Corporation Commercial Proprietary. Relational through JDBCXindice Apache Software Foundation Open Source Proprietaryxml.gax.com GAX Technologies Commercial ProprietaryXpriori XMS Xpriori Commercial ProprietaryXQuantum XML Database Server Cognetic Systems Commercial ProprietaryXStreamDB Native XML Database Bluestream Db. Soft. Corp. Commercial ProprietaryXyleme Zone Server Xyleme SA Commercial Proprietary

Page 50: Structural indexes of XML Databases

Master Informatique 50Dr. Vu Le Anh Structural indexes of XML Databases

Summary

1. Big XML is used in many applications

2. Our problem:

Efficient processing regular queries over XML databases.

3. Two ideas:1. Using Relational databases

2. Building special indexes for XML databases

Page 51: Structural indexes of XML Databases

Master Informatique 51Dr. Vu Le Anh Structural indexes of XML Databases

Summary

4. Tree - index can be applied for XML tree and XML- near tree (using link graph)

5. Structural indexes: Simulate the data-graph by the smaller ones – index graphs. Construction based on the equivalent relationships.

6. Structural indexes is designed for support only a given of queries.

7. It can be applied in distributed XML database query processing (Cloud, Social networks)

Page 52: Structural indexes of XML Databases

Master Informatique 52Dr. Vu Le Anh Structural indexes of XML Databases

References• [Chung et al., SIGMOD 2002]

– Chin-Wan Chung , Jun-Ki Min , Kyuseok Shim, APEX: an adaptive path index for XML data, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin  [doi>10.1145/564691.564706]

• [Dietz82]– Dietz, P. F. 1982. Maintaining order in a linked list. In Proceedings of the Fourteenth Annual ACM Symposium on

theory of Computing (San Francisco, California, United States, May 05 - 07, 1982). STOC '82. ACM, New York, NY, 122-127. DOI= http://doi.acm.org/10.1145/800070.802184

• [Goldman & Widom VLDB 97] – Goldman, R. and Widom, J. 1997. DataGuides: Enabling Query Formulation and Optimization in Semistructured

Databases. In Proceedings of the 23rd international Conference on Very Large Data Bases (August 25 - 29, 1997). M. Jarke, M. J. Carey, K. R. Dittrich, F. H. Lochovsky, P. Loucopoulos, and M. A. Jeusfeld, Eds. Very Large Data Bases. Morgan Kaufmann Publishers, San Francisco, CA, 436-445.

• [Kaushik et al. 02]– Raghav Kaushik, Pradeep Shenoy, Philip Bohannon, Ehud Gudes, "Exploiting Local Similarity for Indexing Paths in

Graph-Structured Data," Data Engineering, International Conference on, p. 0129, 18th International Conference on Data Engineering (ICDE'02), 2002

• [Kiss05]– Attila Kiss, Vu Le Anh A solution for regular queries on XML Data, (PUMA Volume 15 (2005), Issue No. 2, pp .179-

202.• [Kiss06]

– Attila Kiss, Vu Le Anh: Efficient Processing SAPE Queries Using the Dynamic Labelling Structural Indexes. ADBIS 2006: 232-247

• [Kiss07]– Attila Kiss, Vu Le Anh: Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using

Tree- And Structural Indexes. ADBIS Research Communications 2007 • [Li&Moon VLDB 2001]

– Li and Moon, 2001 Li, Q., Moon, B., 2001. Indexing and querying XML data for regular expressions. In: Proceedings of VLDB 2001, pp. 367–370.

• [Milo & Suciu, LNCS 1997]– Milo, T., Suciu, D. (1999), "Index structures for path expressions", 7th International Conference on Database Theory

(ICDT), pp.277-95. • [Paige &Tarjan 87]

– Paige, R. and Tarjan, R. E. 1987. Three partition refinement algorithms. SIAM J. Comput. 16, 6 (Dec. 1987), 973-989. DOI= http://dx.doi.org/10.1137/0216062

Page 53: Structural indexes of XML Databases

Master Informatique 53Dr. Vu Le Anh Structural indexes of XML Databases

Thank you!