orsi vldb11

72
Optimizing Query Answering under Ontological Constraints Giorgio Orsi 1,2 and Andreas Pieris 2 1 Institute for the Future of Computing Oxford Martin School University of Oxford 2 Department of Computer Science University of Oxford VLDB 2011

Upload: giorgio-orsi

Post on 22-Nov-2014

341 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Orsi Vldb11

Optimizing Query Answering

under

Ontological Constraints

Giorgio Orsi1,2 and Andreas Pieris2

1Institute for the Future of Computing

Oxford Martin School

University of Oxford

2Department of Computer Science

University of Oxford

VLDB 2011

Page 2: Orsi Vldb11

Ontological Databases

Ontological Reasoning DB Constraints

Ontological DB

Page 3: Orsi Vldb11

Ontological Databases

D

D

D

ABox

TBox

Ontological Reasoning DB Constraints

Ontological DB

Page 4: Orsi Vldb11

Ontological Databases

D

D

D

Q(X) 9Y (X,Y)

ABox

TBox

Ontological Reasoning DB Constraints

Ontological DB

Page 5: Orsi Vldb11

Ontological Databases

D

D

D

, ABox

TBox

{ t | D [ ² 9u (t,u) }

Ontological Reasoning DB Constraints

Ontological DB

Q(X) 9Y (X,Y)

Page 6: Orsi Vldb11

Ontological Constraints (examples)

Concept Inclusions: 8X emp(X) person(X)

(Inverse) Relation Inclusion:

Relation Transitivity: 8X8Y8Z mgs(X,Y),mgs(Y,Z) mgs(X,Z)

8X8Y manages(X,Y) isManaged(Y,X)

Participation: 8X emp(X) 9Y report(X,Y)

Disjointness: 8X emp(X), customer(X) ?

Functionality: 8X8Y8Z reports(X,Y),reports(X,Z) Y = Z

Page 7: Orsi Vldb11

Datalog§

¡ Datalog variant allowing in the head:

- 9-variables ! TGDs 8X8Y (X,Y) 9Z (X,Z)

- Equality atoms ! EGDs 8X (X) Xi=Xj

- Constant false (?) ! NCs 8X (X) ?

Datalog+

[Cali’ et Al, PODS 09]

Page 8: Orsi Vldb11

Datalog+

Datalog§ [Cali’ et Al, PODS 09]

¡ Datalog variant allowing in the head:

- 9-variables ! TGDs 8X8Y (X,Y) 9Z (X,Z)

- Equality atoms ! EGDs 8X (X) Xi=Xj

- Constant false (?) ! NCs 8X (X) ?

¡ But, query answering under Datalog+ is undecidable

Page 9: Orsi Vldb11

Datalog+

Datalog§ [Cali’ et Al, PODS 09]

¡ Datalog variant allowing in the head:

- 9-variables ! TGDs 8X8Y (X,Y) 9Z (X,Z)

- Equality atoms ! EGDs 8X (X) Xi=Xj

- Constant false (?) ! NCs 8X (X) ?

¡ Datalog+ is syntactically restricted ! Datalog§

¡ But, query answering under Datalog+ is undecidable

Page 10: Orsi Vldb11

Datalog+

Datalog§ [Cali’ et Al, PODS 09]

¡ Datalog variant allowing in the head:

- 9-variables ! TGDs 8X8Y (X,Y) 9Z (X,Z)

- Equality atoms ! EGDs 8X (X) Xi=Xj

- Constant false (?) ! NCs 8X (X) ?

¡ Datalog+ is syntactically restricted ! Datalog§

¡ But, query answering under Datalog+ is undecidable

¡ TGDs more expressive than inclusion dependencies

8D8P8A runs(D,P),area(P,A) 9E employee(E,D,A)

Page 11: Orsi Vldb11

The Chase Procedure

Input: Database D, set of TGDs

Output: A model of D [

person(john)

8X person(X) 9Y father(Y,X) 8X8Y father(X,Y) person(X)

D

chase(D,) = D [ ?

Page 12: Orsi Vldb11

The Chase Procedure

Input: Database D, set of TGDs

Output: A model of D [

person(john)

D

chase(D,) = D [ {father(z1,john)

8X person(X) 9Y father(Y,X) 8X8Y father(X,Y) person(X)

Page 13: Orsi Vldb11

The Chase Procedure

Input: Database D, set of TGDs

Output: A model of D [

person(john)

D

chase(D,) = D [ {father(z1,john), person(z1)

8X person(X) 9Y father(Y,X) 8X8Y father(X,Y) person(X)

Page 14: Orsi Vldb11

The Chase Procedure

Input: Database D, set of TGDs

Output: A model of D [

person(john)

D

chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1)

8X person(X) 9Y father(Y,X) 8X8Y father(X,Y) person(X)

Page 15: Orsi Vldb11

The Chase Procedure

Input: Database D, set of TGDs

Output: A model of D [

person(john)

D

chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1), …}

8X person(X) 9Y father(Y,X) 8X8Y father(X,Y) person(X)

Page 16: Orsi Vldb11

Query Answering via Chase

[see, e.g., Deutsch, Nash & Remmel, PODS 08]

D [ ² Q , chase(D,) ² Q

D

. . .

C = chase(D,)

M1

M2

h1 h2

h1(C) h2(C)

Q h

Page 17: Orsi Vldb11

Query Answering via Chase

D

. . .

C = chase(D,)

M1

M2

h1 h2

h1(C) h2(C)

Q h

Possibly Infinite!

Page 18: Orsi Vldb11

Q

Query answering via rewriting

Page 19: Orsi Vldb11

Q

Q

compilation

Query answering via rewriting

Page 20: Orsi Vldb11

Q

evaluation

Q

Q

compilation

D

Query answering via rewriting

Page 21: Orsi Vldb11

Chase vs rewriting

Page 22: Orsi Vldb11

Linear TGDs

8X8Y r(X,Y) 9Z (X,Z)

single body atom

¡ Properly generalize inclusion dependencies.

¡ Enjoy the bounded-derivation depth property.

¡ FO-rewritable query answering in AC0 (data complexity).

Page 23: Orsi Vldb11

Bounded derivation depth property (BDDP)

D

Q

Page 24: Orsi Vldb11

Bounded derivation depth property (BDDP)

D

Q k levels

constant

w.r.t. D

chase(D,) ² Q ) chasek(D,) ² Q

Page 25: Orsi Vldb11

Bounded derivation depth property (BDDP)

BDDP ) First-Order Rewritability

D

Q k levels

constant

w.r.t. D

Page 26: Orsi Vldb11

Q

q promotesTo(A,B), customer(B) (original query)

promoter(X) Y promotesTo(X,Y)

promotesTo(X,Y) customer(Y)

q promotesTo(A,B), customer(B) Q

FO-rewritability: Example [Gottlob et Al., ICDE 11]

Page 27: Orsi Vldb11

q promotesTo(A,B), customer(B)

q promotesTo(A,B), customer(V0,B)

{ Y = B }

( V0 is fresh )

promoter(X) Y promotesTo(X,Y)

promotesTo(X,Y) customer(Y)

Q

FO-rewritability: Example [Gottlob et Al., ICDE 11]

Q

q promotesTo(A,B), customer(B)

Page 28: Orsi Vldb11

q promotesTo(A,B), customer(B)

q promotesTo(A,B), promotesTo(V0,B) ans(A) promotesTo(A,B)

factorization

{ A = V0 }

promoter(X) Y promotesTo(X,Y)

promotesTo(X,Y) customer(Y)

Q

FO-rewritability: Example [Gottlob et Al., ICDE 11]

Q

q promotesTo(A,B), customer(B)

Page 29: Orsi Vldb11

q promoter(A)

promoter(X) Y promotesTo(X,Y)

promotesTo(X,Y) customer(Y)

Q

FO-rewritability: Example [Gottlob et Al., ICDE 11]

Q

q promotesTo(A,B), customer(B)

q promotesTo(A,B) {X = A, Y = B}

q promotesTo(A,B), customer(B)

Page 30: Orsi Vldb11

UCQ rewriting

(first-order)

promoter(X) Y promotesTo(X,Y)

promotesTo(X,Y) customer(Y)

Q

FO-rewritability: Example [Gottlob et Al., ICDE 11]

Q

q promoter(A)

q promotesTo(A,B), customer(B)

q promotesTo(A,B)

q promotesTo(A,B), customer(B)

Page 31: Orsi Vldb11

FO-rewritability

¡ Desirable properties of a FO-rewriting:

independent on the DB

executable by any DBMS

easy to compute (e.g., polynomial time)

small size (e.g., polynomial size)

Page 32: Orsi Vldb11

FO-rewritability

¡ Unions of Conjunctive Queries (UCQs)

executable by any DBMS

DB independent

easy to optimize and distribute

worst-case exponential size in Q and

Calvanese et Al, JAR 07

Perez Urbina et Al, JAL 09

Cali’ et Al, PODS 09

Gottlob et Al, ICDE 11

and others…

¡ Desirable properties of a FO-rewriting:

independent on the DB

executable by any DBMS

easy to compute (e.g., polynomial time)

small size (e.g., polynomial size)

Page 33: Orsi Vldb11

¡ Combined and hybrid FO-rewriting

good computational properties

(e.g., polynomial in size)

requires access to the DB

Kontchakov et Al., KR 10

Gottlob and Schwentick, DL 11

FO-rewritability

Page 34: Orsi Vldb11

¡ Purely intensional Datalog rewriting

very compressed representation

purely intensional

requires view-creation or Datalog engine

¡ Combined and hybrid FO-rewriting

good computational properties

(e.g., polynomial in size)

requires access to the DB

Kontchakov et Al., KR 10

Gottlob and Schwentick, DL 11

Perez Urbina et Al, JAL 09

Rosati and Almatelli., KR 10

Orsi and Pieris., VLDB 11

FO-rewritability

Page 35: Orsi Vldb11

Datalog rewriting: keep it first-order!

¡ A Datalog query is (in general) not a first-order query

a non-recursive Datalog query is a first-order query

a bounded Datalog query is a first-order query

an output-predicate-bounded Datalog query is a first-order query

Page 36: Orsi Vldb11

Datalog rewriting: keep it first-order!

¡ Input:

a (w.l.o.g. boolean) conjunctive query Q = <q,ρ>

Q : q(X) p(X), s(X,Y) <q, q(X) p(X),s(X,Y) >

a set of linear TGDs

¡ Output:

a bounded Datalog query Q = <q,π >

¡ A Datalog query is (in general) not a first-order query

a non-recursive Datalog query is a first-order query

a bounded Datalog query is a first-order query

an output-predicate-bounded Datalog query is a first-order query

Page 37: Orsi Vldb11

Predicate-bounded Datalog

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

¡ bounded Datalog program

all IDB predicates are bounded.

¡ p-bounded Datalog program

the predicate p is bounded.

Page 38: Orsi Vldb11

Predicate-bounded Datalog

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

¡ bounded Datalog program

all IDB predicates are bounded.

¡ p-bounded Datalog program

the predicate p is bounded.

t is unbounded t(X) cannot be computed using

a DB-independent FO-query

Page 39: Orsi Vldb11

Predicate-bounded Datalog

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

¡ bounded Datalog program

all IDB predicates are bounded.

¡ p-bounded Datalog program

the predicate p is bounded.

t is unbounded t(X) cannot be computed using

a DB-independent FO-query

t(X) r(X,Y), t(Y)

Page 40: Orsi Vldb11

Predicate-bounded Datalog

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

¡ bounded Datalog program

all IDB predicates are bounded.

¡ p-bounded Datalog program

the predicate p is bounded.

t is unbounded t(X) cannot be computed using

a DB-independent FO-query

t(X) r(X,Y), r(Y,Z), t(Z)

t(X) r(X,Y), t(Y) v

Page 41: Orsi Vldb11

Predicate-bounded Datalog

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

¡ bounded Datalog program

all IDB predicates are bounded.

¡ p-bounded Datalog program

the predicate p is bounded.

t is unbounded t(X) cannot be computed using

a DB-independent FO-query

t(X) r(X,Y), r(Y,Z), t(Z) v

t(X) r(X,Y), t(Y) v

t(X) r(X,Y), r(Y,Z), r(Z,W), t(W)

Page 42: Orsi Vldb11

Predicate-bounded Datalog

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

¡ bounded Datalog program

all IDB predicates are bounded.

¡ p-bounded Datalog program

the predicate p is bounded.

t is unbounded t(X) cannot be computed using

a DB-independent FO-query

t(X) r(X,Y), r(Y,Z), t(Z) v

t(X) r(X,Y), t(Y) v

t(X) r(X,Y), r(Y,Z), r(Z,W), t(W) v

Page 43: Orsi Vldb11

Predicate-bounded Datalog

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

¡ bounded Datalog program

all IDB predicates are bounded.

¡ p-bounded Datalog program

the predicate p is bounded.

q is bounded q(X) can be computed using

a DB-independent FO-query

Page 44: Orsi Vldb11

Predicate-bounded Datalog

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

¡ bounded Datalog program

all IDB predicates are bounded.

¡ p-bounded Datalog program

the predicate p is bounded.

q is bounded q(X) can be computed using

a DB-independent FO-query

q(X) r(X,Y)

Page 45: Orsi Vldb11

Predicate-bounded Datalog

r(X,Y), t(Y) → t(X)

r(X,Y ) → r(Y,X)

r(X,Y ), t(Z) → q(X)

¡ bounded Datalog program

all IDB predicates are bounded.

¡ p-bounded Datalog program

the predicate p is bounded.

q is bounded q(X) can be computed using

a DB-independent FO-query

q(X) r(Y,X)

q(X) r(X,Y) v

Page 46: Orsi Vldb11

Output-predicate-bounded Datalog and BDDP

OPB

DATALOG

PROGRAM

Page 47: Orsi Vldb11

Output-predicate-bounded Datalog and BDDP

INPUT

RULES

OPB

DATALOG

PROGRAM

BDDP

RULES

OUTPUT

RULES

Page 48: Orsi Vldb11

Output-predicate-bounded Datalog and BDDP

OPB

DATALOG

PROGRAM

enjoy

BDDP

property

non recursive non recursive

INPUT

RULES

BDDP

RULES

OUTPUT

RULES

Page 49: Orsi Vldb11

Datalog Rewriting: Skolemization (and renaming)

r(X,Y) Z s(Y,Z)

s(X,Y) Z p(Y,Y,Z)

p(X,Y,Z) t(Z)

Page 50: Orsi Vldb11

Datalog Rewriting: Skolemization (and renaming)

r(X,Y) Z s(Y,Z)

s(X,Y) Z p(Y,Y,Z)

p(X,Y,Z) t(Z)

r(X1,Y1) s(Y1,f1(Y1))

s(X2,Y2) p(Y2,Y2,f2(Y2))

p(X3,Y3,Z3) t(Z3)

f

Page 51: Orsi Vldb11

Datalog Rewriting: Skolemization (and renaming)

r(X,Y) Z s(Y,Z)

s(X,Y) Z p(Y,Y,Z)

p(X,Y,Z) t(Z)

¡ f and are equisatisfiable (not equivalent)

¡ Introduce one Skolem function for each existential variable

r(X1,Y1) s(Y1,f1(Y1))

s(X2,Y2) p(Y2,Y2,f2(Y2))

p(X3,Y3,Z3) t(Z3)

f

Page 52: Orsi Vldb11

Datalog Rewriting: Rule Saturation

¡ Apply resolution inference rule to rules in f

at least one of the rules contains Skolem terms

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

f

Page 53: Orsi Vldb11

Datalog Rewriting: Rule Saturation

¡ Apply resolution inference rule to rules in f

at least one of the rules contains Skolem terms

f [f]

r(X1,Y1) p(f1(Y1) ,f1(Y1), f2(f1(Y1)))

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

Page 54: Orsi Vldb11

Datalog Rewriting: Properties of Rule Saturation

¡ [f] mimics the chase derivations.

Page 55: Orsi Vldb11

Datalog Rewriting: Properties of Rule Saturation

¡ [f] mimics the chase derivations.

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

Page 56: Orsi Vldb11

Datalog Rewriting: Properties of Rule Saturation

¡ [f] mimics the chase derivations.

¡ [f] depends only on .

¡ [f] is possibly infinite

linear TGDs have BDDP: suffices to construct it up to k steps [f]k.

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

Page 57: Orsi Vldb11

Datalog Rewriting: Query Saturation

¡ resolve [f] with the query Q.

use only rules with Skolem terms.

Page 58: Orsi Vldb11

Datalog Rewriting: Query Saturation

¡ resolve [f] with the query Q.

use only rules with Skolem terms.

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

[δ12]] : r (X1,Y1) p(f1(Y1) ,f1(Y1), f2(f1(Y1)))

Q s(A,B), p(B,B,C) [f] Q

Q r(X1,Y1), p(f1(Y1), f1(Y1),C)

[Q,f]

Page 59: Orsi Vldb11

Datalog Rewriting: Query Saturation

¡ bypasses chase derivations with function symbols

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

[δ12]] : r (X1,Y1) p(f1(Y1) ,f1(Y1), f2(f1(Y1)))

Q s(A,B), p(B,B,C)

[f]

Q

Page 60: Orsi Vldb11

Datalog Rewriting: Finalization

¡ keep only the function-free rules from [f] [ [Q,f]

¡ derivations producing certain answers are captured by function-

symbol-free rules.

¡ (optional) add auxiliary rules to make the rewriting a proper Datalog

query i.e., clear distinction IDB/EDB.

Page 61: Orsi Vldb11

OPB-Datalog and BDDP revisited

OPB

DATALOG

PROGRAM

INPUT

RULES

BDDP

RULES

OUTPUT

RULES

ff([f]) ff([Q,f]) aux

enjoy

BDDP

property

non recursive non recursive

Page 62: Orsi Vldb11

¡ use the predicate graph to reduce the number of rules in f

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

f

Q s(A,B), p(B,B,C)

Q

Optimizations: Pruning

Page 63: Orsi Vldb11

¡ use the predicate graph to reduce the number of rules in f

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

f

Q s(A,B), p(B,B,C)

Q

Optimizations: Pruning

r s

p t

Q

Page 64: Orsi Vldb11

Optimizations: Pruning

¡ use the predicate graph to reduce the number of rules in f

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

f

Q s(A,B), p(B,B,C)

Q

r s

p t

Q

Page 65: Orsi Vldb11

Optimizations: Pruning

¡ use the predicate graph to reduce the number of rules in f

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

δ3 : p(X3,Y3,Z3) t(Z3)

f

Q s(A,B), p(B,B,C)

Q

¡ we are no longer

independent on Q!

r s

p t

Q

Page 66: Orsi Vldb11

Optimizations: Query Elimination

¡ eliminate implied atoms during query saturation

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

f Q

Q s(A,B), p(B,B,C)

Page 67: Orsi Vldb11

Optimizations: Query Elimination

¡ eliminate implied atoms during query saturation

f Q

s(A,B) ² p(B,B,C)

Q s(A,B), p(B,B,C)

atom coverage

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

Page 68: Orsi Vldb11

Optimizations: Query Elimination

¡ eliminate implied atoms during query saturation

f

Q s(A,B), p(B,B,C) ≡ Q s(A,B)

Q

s(A,B) ² p(B,B,C)

Q s(A,B), p(B,B,C)

atom coverage

δ1 : r (X1,Y1) s(Y1,f1(Y1))

δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))

Page 69: Orsi Vldb11

Optimizations: Query Elimination

¡ unique elimination strategy (w.r.t. the number of eliminated atoms)

see paper.

¡ given m = |body(ρ)| and n = ||

worst-case size of [f] [ [Q,f] is O((n∙m)m)

worst-case size of Q = <q,π > is O(n+m)m

¡ atom coverage under linear TGDs can be checked in polynomial time

see paper.

Page 70: Orsi Vldb11

Experimental Results

Page 71: Orsi Vldb11

Discussion

¡ Datalog rewriting is substantially more compact than UCQ rewriting.

¡ Does Datalog improve the end-to-end performance?

¡ Extend the procedure to larger classes of TGDs

guarded TGDs [Cali’ et Al, PODS 09] non FO-rewritable

sticky-join TGDs [Cali’ et Al, VLDB 10]

Page 72: Orsi Vldb11

The Datalog Family

Thomas

Lukasiewicz

Georg

Gottlob

Andreas

Pieris

Andrea

Calì

Giorgio

Orsi

Thank you!