orsi vldb11

Optimizing Query Answering

Ontological Constraints

Giorgio Orsi1,2 and Andreas Pieris2

1Institute for the Future of Computing

Oxford Martin School

University of Oxford

2Department of Computer Science

University of Oxford

VLDB 2011

Ontological Databases

Ontological Reasoning DB Constraints

Ontological DB

Q(X) 9Y (X,Y)

Ontological DB

, ABox

{ t | D [ ² 9u (t,u) }

Ontological DB

Q(X) 9Y (X,Y)

Ontological Constraints (examples)

Concept Inclusions: 8X emp(X) person(X)

(Inverse) Relation Inclusion:

Relation Transitivity: 8X8Y8Z mgs(X,Y),mgs(Y,Z) mgs(X,Z)

8X8Y manages(X,Y) isManaged(Y,X)

Participation: 8X emp(X) 9Y report(X,Y)

Disjointness: 8X emp(X), customer(X) ?

Functionality: 8X8Y8Z reports(X,Y),reports(X,Z) Y = Z

Datalog§

¡ Datalog variant allowing in the head:

- 9-variables ! TGDs 8X8Y (X,Y) 9Z (X,Z)

- Equality atoms ! EGDs 8X (X) Xi=Xj

- Constant false (?) ! NCs 8X (X) ?

Datalog+

[Cali’ et Al, PODS 09]

Datalog+

Datalog§ [Cali’ et Al, PODS 09]

¡ But, query answering under Datalog+ is undecidable

Datalog+

¡ Datalog+ is syntactically restricted ! Datalog§

Datalog+

¡ Datalog+ is syntactically restricted ! Datalog§

¡ TGDs more expressive than inclusion dependencies

8D8P8A runs(D,P),area(P,A) 9E employee(E,D,A)

The Chase Procedure

Input: Database D, set of TGDs

Output: A model of D [

person(john)

8X person(X) 9Y father(Y,X) 8X8Y father(X,Y) person(X)

chase(D,) = D [ ?

The Chase Procedure

person(john)

chase(D,) = D [ {father(z1,john)

The Chase Procedure

person(john)

chase(D,) = D [ {father(z1,john), person(z1)

The Chase Procedure

person(john)

chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1)

The Chase Procedure

person(john)

chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1), …}

Query Answering via Chase

[see, e.g., Deutsch, Nash & Remmel, PODS 08]

D [ ² Q , chase(D,) ² Q

C = chase(D,)

h1(C) h2(C)

Query Answering via Chase

C = chase(D,)

h1(C) h2(C)

Possibly Infinite!

Query answering via rewriting

compilation

evaluation

compilation

Chase vs rewriting

Linear TGDs

8X8Y r(X,Y) 9Z (X,Z)

single body atom

¡ Properly generalize inclusion dependencies.

¡ Enjoy the bounded-derivation depth property.

¡ FO-rewritable query answering in AC0 (data complexity).

Bounded derivation depth property (BDDP)

Q k levels

constant

w.r.t. D

chase(D,) ² Q ) chasek(D,) ² Q

Bounded derivation depth property (BDDP)

BDDP ) First-Order Rewritability

Q k levels

constant

w.r.t. D

q promotesTo(A,B), customer(B) (original query)

promoter(X) Y promotesTo(X,Y)

promotesTo(X,Y) customer(Y)

q promotesTo(A,B), customer(B) Q

FO-rewritability: Example [Gottlob et Al., ICDE 11]

q promotesTo(A,B), customer(B)

q promotesTo(A,B), customer(V0,B)

{ Y = B }

( V0 is fresh )

q promotesTo(A,B), promotesTo(V0,B) ans(A) promotesTo(A,B)

factorization

{ A = V0 }

q promoter(A)

q promotesTo(A,B) {X = A, Y = B}

UCQ rewriting

(first-order)

q promoter(A)

q promotesTo(A,B)

FO-rewritability

¡ Desirable properties of a FO-rewriting:

independent on the DB

executable by any DBMS

easy to compute (e.g., polynomial time)

small size (e.g., polynomial size)

FO-rewritability

¡ Unions of Conjunctive Queries (UCQs)

DB independent

easy to optimize and distribute

worst-case exponential size in Q and

Calvanese et Al, JAR 07

Perez Urbina et Al, JAL 09

Cali’ et Al, PODS 09

Gottlob et Al, ICDE 11

and others…

¡ Desirable properties of a FO-rewriting:

independent on the DB

easy to compute (e.g., polynomial time)

small size (e.g., polynomial size)

¡ Combined and hybrid FO-rewriting

good computational properties

(e.g., polynomial in size)

requires access to the DB

Kontchakov et Al., KR 10

Gottlob and Schwentick, DL 11

FO-rewritability

¡ Purely intensional Datalog rewriting

very compressed representation

purely intensional

requires view-creation or Datalog engine

¡ Combined and hybrid FO-rewriting

good computational properties

(e.g., polynomial in size)

requires access to the DB

Kontchakov et Al., KR 10

Gottlob and Schwentick, DL 11

Perez Urbina et Al, JAL 09

Rosati and Almatelli., KR 10

Orsi and Pieris., VLDB 11

FO-rewritability

Datalog rewriting: keep it first-order!

¡ A Datalog query is (in general) not a first-order query

a non-recursive Datalog query is a first-order query

a bounded Datalog query is a first-order query

an output-predicate-bounded Datalog query is a first-order query

Datalog rewriting: keep it first-order!

¡ Input:

a (w.l.o.g. boolean) conjunctive query Q = <q,ρ>

Q : q(X) p(X), s(X,Y) <q, q(X) p(X),s(X,Y) >

a set of linear TGDs

¡ Output:

a bounded Datalog query Q = <q,π >

¡ A Datalog query is (in general) not a first-order query

a non-recursive Datalog query is a first-order query

a bounded Datalog query is a first-order query

an output-predicate-bounded Datalog query is a first-order query

Predicate-bounded Datalog

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

¡ bounded Datalog program

all IDB predicates are bounded.

¡ p-bounded Datalog program

the predicate p is bounded.

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

t is unbounded t(X) cannot be computed using

a DB-independent FO-query

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

t(X) r(X,Y), t(Y)

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

t(X) r(X,Y), r(Y,Z), t(Z)

t(X) r(X,Y), t(Y) v

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

t(X) r(X,Y), r(Y,Z), t(Z) v

t(X) r(X,Y), t(Y) v

t(X) r(X,Y), r(Y,Z), r(Z,W), t(W)

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

t(X) r(X,Y), r(Y,Z), t(Z) v

t(X) r(X,Y), t(Y) v

t(X) r(X,Y), r(Y,Z), r(Z,W), t(W) v

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

q is bounded q(X) can be computed using

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

q(X) r(X,Y)

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

q(X) r(Y,X)

q(X) r(X,Y) v

Output-predicate-bounded Datalog and BDDP

DATALOG

PROGRAM

DATALOG

PROGRAM

OUTPUT

DATALOG

PROGRAM

property

non recursive non recursive

OUTPUT

Datalog Rewriting: Skolemization (and renaming)

r(X,Y) Z s(Y,Z)

s(X,Y) Z p(Y,Y,Z)

p(X,Y,Z) t(Z)

r(X,Y) Z s(Y,Z)

s(X,Y) Z p(Y,Y,Z)

p(X,Y,Z) t(Z)

r(X1,Y1) s(Y1,f1(Y1))

s(X2,Y2) p(Y2,Y2,f2(Y2))

p(X3,Y3,Z3) t(Z3)

r(X,Y) Z s(Y,Z)

s(X,Y) Z p(Y,Y,Z)

p(X,Y,Z) t(Z)

¡ f and are equisatisfiable (not equivalent)

¡ Introduce one Skolem function for each existential variable

r(X1,Y1) s(Y1,f1(Y1))

s(X2,Y2) p(Y2,Y2,f2(Y2))

p(X3,Y3,Z3) t(Z3)

Datalog Rewriting: Rule Saturation

¡ Apply resolution inference rule to rules in f

at least one of the rules contains Skolem terms

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

Datalog Rewriting: Rule Saturation

¡ Apply resolution inference rule to rules in f

at least one of the rules contains Skolem terms

r(X1,Y1) p(f1(Y1) ,f1(Y1), f2(f1(Y1)))

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

Datalog Rewriting: Properties of Rule Saturation

¡ [f] mimics the chase derivations.

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

¡ [f] depends only on .

¡ [f] is possibly infinite

linear TGDs have BDDP: suffices to construct it up to k steps [f]k.

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

Datalog Rewriting: Query Saturation

¡ resolve [f] with the query Q.

use only rules with Skolem terms.

¡ resolve [f] with the query Q.

use only rules with Skolem terms.

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

[δ12]] : r (X1,Y1) p(f1(Y1) ,f1(Y1), f2(f1(Y1)))

Q s(A,B), p(B,B,C) [f] Q

Q r(X1,Y1), p(f1(Y1), f1(Y1),C)

¡ bypasses chase derivations with function symbols

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

[δ12]] : r (X1,Y1) p(f1(Y1) ,f1(Y1), f2(f1(Y1)))

Q s(A,B), p(B,B,C)

Datalog Rewriting: Finalization

¡ keep only the function-free rules from [f] [ [Q,f]

¡ derivations producing certain answers are captured by function-

symbol-free rules.

¡ (optional) add auxiliary rules to make the rewriting a proper Datalog

query i.e., clear distinction IDB/EDB.

OPB-Datalog and BDDP revisited

DATALOG

PROGRAM

OUTPUT

ff([f]) ff([Q,f]) aux

property

non recursive non recursive

¡ use the predicate graph to reduce the number of rules in f

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

Q s(A,B), p(B,B,C)

Optimizations: Pruning

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

Q s(A,B), p(B,B,C)

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

Q s(A,B), p(B,B,C)

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

Q s(A,B), p(B,B,C)

¡ we are no longer

independent on Q!

Optimizations: Query Elimination

¡ eliminate implied atoms during query saturation

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

Q s(A,B), p(B,B,C)

s(A,B) ² p(B,B,C)

Q s(A,B), p(B,B,C)

atom coverage

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

Q s(A,B), p(B,B,C) ≡ Q s(A,B)

s(A,B) ² p(B,B,C)

Q s(A,B), p(B,B,C)

atom coverage

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

¡ unique elimination strategy (w.r.t. the number of eliminated atoms)

see paper.

¡ given m = |body(ρ)| and n = ||

worst-case size of [f] [ [Q,f] is O((n∙m)m)

worst-case size of Q = <q,π > is O(n+m)m

¡ atom coverage under linear TGDs can be checked in polynomial time

see paper.

Experimental Results

Discussion

¡ Datalog rewriting is substantially more compact than UCQ rewriting.

¡ Does Datalog improve the end-to-end performance?

¡ Extend the procedure to larger classes of TGDs

guarded TGDs [Cali’ et Al, PODS 09] non FO-rewritable

sticky-join TGDs [Cali’ et Al, VLDB 10]

The Datalog Family

Thomas

Lukasiewicz

Gottlob

Andreas

Pieris

Andrea

Giorgio

Thank you!

orsi vldb11

Documents

arq _ fotógrafa mariana orsi

orsi report formatting using styles

business mathematics u2w6ol rétallér orsi

orsi prospectus 2016

apostila direito previdenciario renata orsi

about orsi about tric project review workshop...

boletín orsi - nº 9

cube (orsi)

orsi, guillermo - sueños de perro

snipsuggest: context-aware autocompletion for...

aero club adele orsi - acao no 1 promozione.pdf · aero...

szabó orsi: flow a mindenhol

sistemasoperativosdistribuidos(orsi gongora 6551)

boletín orsi - nº 14

boletín orsi - nº 6

boletín orsi - nº 5

yamandú orsi. Énfasis programático

album fotografico gerez, orsi

boletin orsi - nº 15

boletín orsi nº18