xal - an x ml al gebra for query optimization

30
ADC 2002 /department of mathematics and computer science TU/e eindhoven university of technology January 29, 2002 1 XAL - An X ML AL gebra for Query Optimization Flavius Frasincar Geert-Jan Houben Cristian Pau Databases & Hypermedia Group Division of Computer Science

Upload: quincy-ayers

Post on 31-Dec-2015

35 views

Category:

Documents


0 download

DESCRIPTION

XAL - An X ML AL gebra for Query Optimization. Flavius Frasincar Geert-Jan Houben Cristian Pau. Databases & Hypermedia Group Division of Computer Science. Contents. Motivation XML Query Algebra Goals XML Query Algebras XAL XAL Optimization Laws - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 1

XAL - An XML ALgebra for Query Optimization

Flavius Frasincar

Geert-Jan Houben

Cristian Pau

Databases & Hypermedia Group

Division of Computer Science

Page 2: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 2

Contents

1. Motivation2. XML Query Algebra Goals3. XML Query Algebras4. XAL 5. XAL Optimization Laws6. XAL Heuristic Optimization Algorithm7. XAL Query Example8. Conclusion and Future Work

Page 3: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 3

1. Motivation

• Hera project: automatic hypermedia presentation of data residing in the heterogeneous ‘deep’ web

• Use XML technologies for querying, transforming, and integrating large amounts of Web data

• Optimization of XML queries is important: need of an XML algebra for query optimization

Page 4: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 4

2. XML Query Algebra Goals

• Based on W3C XML Query Data Model• Genericity – logical operators independent of the

underlying storage representation– Optimizability – support query optimizations

• Expressivity – express a large class of queries– Composability – operators are closed on the same data

type

– Flexibility – support various data types

Page 5: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 5

• Lore (Stanford) specific set of logical operators

• Beech et al. (industry) logical model, no optimization

strategies

• YATL (INRIA) specific data model, focus on

data integration

• XOM (Zhang & Dong)

complete and closed, no optimization support

• SAL (Beeri & Tzaban) focus on semistructured data, limited optimization support

• XQuery (W3C) weak support for optimization

(unordered forests)

3. XML Query Algebras

Page 6: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 6

4. XAL

• Based on W3C XML Query Data Model• Reduces the impedance mismatch between

databases and XML (query languages) by allowing a mix of ordered/unordered operators

• Support for optimization (reuse the query optimization heuristics from relational systems)

• Fine grained algebra of vertices and edges (Genericity)

• Composability, Flexibility, XQuery Compatibility

Page 7: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 7

4.1. XAL Data Model

• Rooted connected directed graph with a partial order relation on edges– Acyclic (lexical view)– Cyclic (semantic view)

• Formally,

})(,|),...,{(

...

),,,,(

21 pvparentVvvvvO

EEEEE

VVVVVV

VrootrootOEVG

ielementinVp

DRAE

IDREFIDstringintelement

Page 8: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 8

Properties for Vertex

Basic Property

Element Vertex

Result

Simple

Vertex

Result

value identifier value

(e.g.“Dali”)

type element type of value (e.g.string)

Derived Property

Result

name name of the incoming E edge

parent parent vertex (via E edge)

parentedge incoming E edge

childelements outgoing E edges

attributes outgoing A edges

references outgoing R edges

Page 9: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 9

Properties for Edge

Basic Property

Result

name element name (E)

attribute name (A)

ID attribute name (R)

“Data” (D)

type E, A, R, D

parent source vertex of the edge

child target vertex of the edge

Derived Property

Result

next following sibling edge

previous preceding sibling edge

Note: Derived Property apply to E, D edges

Page 10: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 10

4.2. XAL Operators

• All operators have the following form

o[f](x1, x2, … xn: expression)

• Unary operators evaluate the input to a collection of vertices and use the implicit map operation to evaluate the result

• Closedness = all operators are closed on collections (support composability)

Page 11: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 11

Operator Semanticso[f](x: expression)

Variable x is bound to each vertex in the input collection. For each such binding f(x) is evaluated

The semantics of the operator o defines how the partial result (resulting from one variable binding) is computed from f(x)

The operator result is built by concatenating all the partial results

Page 12: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 12

Collection

• Generalization of list and set (collections have a boolean order property)

• Similar to the mathematician’s monad and functional programmer’s (list) comprehension

Monad<M>, where M is a type is a triplet of functions(map<M>, unit<M>, join <M>)

XAL has map and join (called union) but no unit operator (the singleton collection is written as the singleton itself)Collections have elements of arbitrary types

Page 13: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 13

Operators Type

• Extraction operators – retrieve the needed information from XML documents

• Meta-operators – control the evaluation of expressions

• Construction operators – build new XML documents from the extracted data

Note: two vertices are equal if they have the same value

Page 14: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 14

Extraction Operators

• Projection [type, name](e: expr)

• Selection [condition](e: expr)

• Unorder (e: expr)

• Join (x: expr) ⋈[condition] (y: expr)

• Cartesian Product (x: expr) (y: expr)

• Union (x: expr) (y:expr)

• Difference (x: expr) (y:expr)

• Intersection (x: expr) (y:expr)

Note: Flexibility, x and y do not have to be “union compatible” like in relational algebra

Page 15: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 15

Projection

[type, name](e: expression)

type = E, A, R, D or disjunctions (|) of thesename = regular expression over strings

Example. [E, (P|p)ainter[s]#)](e) produces all the target vertices of element containment (E) edges that have names starting with Painter, painter, Painters, or painters, and that originate from the vertices in e

Page 16: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 16

Meta-operators & Construction Operators

• Mapmap[f](e: expression)

• Kleene Star *[f](e: expression)

Note: e is included in the result

• Create vertexvertex[type](value)

Note: for element vertices the value (identifier) is given by the system

• Create edgeedge[type, name, parent](child)

Page 17: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 17

An Example• Copy a complete graph starting from the vertex v

map[edge[type(e), name(e), vertex[type(parent(e))](value(parent(e))) ](vertex[type(child(e))](value(child(e)))) ](e)

where e = *[parentedge([E|A|D, #](child(x))) ](x: parentedge([E|A|D, #](v)))

Page 18: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 18

5. XAL Optimization Laws

• The main factor in the execution cost of algebra expressions is the iteration (explicit or implicit map operator) over collections

• The proposed set of optimization laws aims at reducing iteration size for the data extraction expressions

• The laws are inspired by monad laws and relational algebraic optimization rules

Page 19: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 19

• Law 1 (Left unit)

If e1 is of unit type (singleton collection), then

e2(e1) = e2 (v := e1)

• Law 2 (Right unit)

If e2 is the identity function, i.e. e2 (v) = v, then

e2(e1) = e1

• Law 3 (Associativity)

(e1 o e2) o e3 = e1 o ( e2 o e3 )

• Law 4 (Empty collection)

If e2 is the empty function, i.e. e2(v) = (), then

e2(e1) = ()

• Law 5 (Decomposition of join)

e1 ⋈[condition] e2 = [condition](e1 e2)

Page 20: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 20

• Law 6 (Decomposition of projection) If name is a regular expression that can be decomposed in several

regular expressions n1, n2 , … nn and e is an unordered collection, then

[name](e) = [n1](e) [n2](e) … [nn](e)

• Law 7 (Cascading of selection)

[c1∧c2∧ … cn](e) = [c1]([ c2]( … ([ cn ](e)) … ))

• Law 8 (Commutativity of selection)

[c1]([c2](e)) = [c2]([c1](e))

• Law 9 (Commutativity of selection with projection)

If the condition c involves solely vertices that have incoming edges named by the regular expression name, then

[name]([c([name])](e)) = [c]([name](e))

• Law 10 (Commutativity of selection with cartesian product)

If the condition c involves solely vertices from e1 , then

[c](e1 e2) = [c](e1 ) e2

Page 21: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 21

• Law 11 (Commutativity of selection with binary operators)

If is one of the set operators: , , or , then

[c](e1 e2) = [c](e1) [c](e2)

• Law 12 (Commutativity of binary operators)

If is one of the set operators: , , or and e1 and e2 are unordered collections, then

e1 e2 = e2 e1

• Law 13 (Commutativity of projection with cartesian product)

If name is a regular expression that can decomposed in two regular expressions name1 and name2, name1 involves solely vertices in e1 and name2 involves solely vertices in e2 , then

[name](e1 e2) = [name1](e1) [name2](e2)

• Law 14 (Commutativity of projection with union)

[name](e1 e2) = [name](e1) [name](e2)

Page 22: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 22

6. XAL Heuristic Optimization Algorithm

S1. Eliminate unnecessary iterations (use Laws 1, 2, and 4). After each following step, S1 is applied again.

S2. Unorder collections (use unorder operator). Collections for which order is not relevant are unordered.

S3. Decompose joins (use Law 5). S4. Decompose selections (use Law 7). Break down selections into a

cascade of selections. It enables moving select operations down in the query tree.

S5. Move selections down as far as possible (use Laws 8, 9, 10, and 11). Based on the commutativity of selection with other operators move selections down in the query tree as far as it is permitted by the selection condition.

Page 23: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 23

S6. Apply the most restrictive selections first (use Laws 3 and 12). Based on the commutativity and associativity of binary operators rearrange the leaf vertices so that the most restrictive selections apply first.

Note: As a selectivity criterion one can use the size of the collection.

The most restrictive selections are the selections that produce collections with the fewest elements.

S7. Decompose projections (use Law 6). Break down projections into a union of projections. It enables moving the project operations down in the query tree.

S8. Move projections down as far as possible (use Laws 1, 2, and 4). Based on the commutativity of projection with other operators, move projections down in the query tree as far as possible.

S9. Identify combined operations (use composition laws). Identify subtrees that group operations that can be executed by a single program.

Page 24: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 24

7. XAL Query Example

• XML repository with three documents:

painters.xml

<painters>

<painter>

<name>Rembrandt</name>

<description>Dutch painter</description>

</painter>

</painters>

catalogue.xml

<items>

<item>

<paintingid>Painting_ID01</paintingid>

<price>1500000</price>

</item>

</items>

paintings.xml

<paintings>

<painting>

<id>Painting_ID01</id>

<name>The Stone Bridge</name>

<author>Rembrandt</author>

</painting>

</paintings>

Page 25: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 25

• Query: Return in alphabetical order the name of the painters that have a painting

over $1 000 000 (the name of the painters will appear in the <result> element as many times

as the number of their paintings that fulfill the above condition)

• XQuery 1.0:<result>{FOR $i IN document(“painters.xml”)/painters/painter, $j IN document(“paintings.xml”)/paintings/painting[author = $i/name], $k IN document(“catalogue.xml”)/items/item[paintingid = $j/id]WHERE $k/price/data() > 1000000RETURN $i/nameSORTBY ./data()}</result>

Page 26: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 26

• Input:– painters.xml: 3 painters (1,2,3)– paintings.xml: 100 paintings for painter 1

150 paintings for painter 2

100 paintings for painter 3

– catalogue.xml: Only painter 1 has 20 paintings more expensive than $1 000 000, all the other paintings are below $1 000 000

Page 27: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 27

• Initial Query Tree

– Output is alphabetically ordered!

Cartesian Product: 3 x 350 x 350 = 367 500 elements

painter painting

item

painter.name=painting.author painting.id=item.paintingid

item.price > 1000000

painter.name

data

XQUERY XAL

FOR , , WHERE SORTBY

Page 28: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 28

item

painter painting

painter.name=painting.author

item.price > 1000000

painter.name

data

paintin.id=item.paintingid

• I Optimization

– Step 2: Unorder collections

(commutativity of XAL binary operators)

– Step 4: Decompose selections

– Step 5: Move selections down as far as possible

Cartesian Product:

3 x 350 +

350 x 20 = 8 050 elements

Page 29: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 29

item

painter

painting

painter.name=painting.author

item.price > 1000000

painter.name

data

paintin.id=item.paintingid

• II Optimization

– Step 6: Apply the most restrictive selections first

(switch positions of painter and item)

Cartesian Product: 20 x 350 + 20 x 3 = 7 060 elements

Page 30: XAL - An  X ML  AL gebra for   Query Optimization

ADC 2002

/department of mathematics and computer science

TU/e eindhoven university of technology

January 29, 2002 30

8. Conclusion and Future Work

• XAL provides an elegant way (by applying the ‘unorder’ operator) to reuse the heuristic optimization algorithm from relational queries

• Investigate new optimization laws that take advantage of the XML specific features (e.g. tree structure, internal references)

• Build a translation scheme from XQuery to XAL, exploring the power of expression of XAL