ontologies and large databases - inria...ontologies and large databases querying algorithms for the...
TRANSCRIPT
Ontologies and Large Databases Querying Algorithms for the Web of Data Jean-François Baget ([email protected]) Artificial Intelligence meets the Web of Data, Montpellier, 2013
EQUIPE PROJET
GraphIK INRIA Sophia-Antiplis
LIRMM
ONTOLOGY-BASED DATA
ACCESS FOR THE WEB OF DATA
Query
Ontology-based Query Answering
Knowledge Base
Data/Facts
Answers ?
Ontology
Data / Facts
Relational Database RDF (Semantic Web)
rdf:type
ex:parent
ex:a
F
rdf:type
ex:b
M
a b
a c
c ?
… …
parentOf
b
…
Male Fem.
a
…
ex:c
Abstraction in first-order logic
x( parentOf(a,b) parentOf(a,c)
parentOf(c,x) F(a) M(b) )
Etc.
ex:parent
Or in graphs / hypergraphs
1 2 a b p
1 2
c p
p
1
2
F M
A conjunction of facts ≈ a fact
A (boolean) CQ and a fact have the same form
Body Head
X, Y, Z :
tuples of variables
Any conjunction of
atoms (on variables
and constants)
x y (siblingOf(x,y) z (parentOf(z,x) parentOf(z,y)))
Same as Tuple Generating Dependencies
[+ Equality Generating Dependencies, Negative Constraints]
See also Datalog+/-
Same as the logical translation of Conceptual Graph rules
Generalize light Description Logics used for OBDA (DL-Lite, EL, …)
X Y ( B[X, Y] Z H[X, Z] )
Ontology: Existential Rules
Simplified form: siblingOf(x,y) parentOf(z,x) parentOf(z,y)
F = siblingOf(a,b)
Then h(head) can be « added » to F
Generating Fresh Variables
R = x y (siblingOf(x,y) z (parentOf(z,x) parentOf(z,y)))
F’= z0 (siblingOf(a,b) parentOf(z0,a) parentOf(z0,a))
h ={(x,a), (y,b)}
h: body F
A rule body head is applicable to a fact F if
there is a homomorphism h: body F
1
S
a
b
2
P
P 2 S
1
2
2
1 x
y
1
z
1
S
a
b
2
P
P
2
2
1
1 z0 with renaming existential variables of head
Basic Decision Problem
Given a KB K = (F, R) and a (Boolean) conjunctive query Q,
is Q entailed by K ?
Conjunctive Query Entailment
Existential Rules
(without equality)
Conjunctive
Query
Knowledge Base
Fact(s)
K = ( F, R )
Q
Forward vs Backward Chaining
F Q
R
F Q R
FC
BC
Fact saturation (« bottom-up »)
Query rewriting (« top-down »)
[Decomposition into 2 steps: DL-Lite]
F, R |= Q iff there is a homomorphism from Q’ to F, where Q’ is obtained by a
rewriting sequence from Q with R
F, R |= Q iff there is a homomorphism from Q to F’, where F’ is obtained by
a derivation sequence from F withR
Decidability Issues
Entailment is not decidable
Many decidable classes exhibited in databases and KR
Three generic kinds of properties ensuring decidability:
- Saturation by Forward Chaining halts (« finite expansion set », fes)
- Query rewriting by Backward Chaining halts (« finite unification
set », fus)
- Saturation by Forward Chaining may not halt but the generated
facts have a tree-like structure (« bounded treewidth set », bts)
These properties are not recognizable [Baget+ KR 10]
but they provide generic algorithms
Datalog
weakly-acyclic
wa-GRD
jointly
acyclic
acyclic
GRD
w-sticky
w-sticky-join
Inclusion
dependency
atomic body
domain-r
sticky join
sticky
guarded frontier-1
frontier-
guarded
weakly-
guarded
weakly fg
jointly fg
glut fg
FES
FUS
BTS
(Partial) Inclusion Map of
Decidable Classes
FINITE EXPANSION SETS
00 MOIS 2011 EMETTEUR - NOM DE LA PRESENTATION - 11
Breadth-First FC / Chase
F0 = F
Fi (i >1) obtained from Fi-1 by performing all possible rule applications on Fi-1
Any atom with rank i can be obtained by a derivation sequence of
length ≤ max (|rule body|)i
F0
F1
Fk
…
Atoms with rank 0
Atoms with rank 1
Atoms with rank k
bound independent from |F|
F,R |= Q iff there is k s.t. Q maps to Fk
Note: k is not independent from F
fes ≠ « bounded » set of rules (cf. this notion in Datalog)
R is bounded if there is k s.t. for all F, Fk+1 Fk
(ex: acyclic GRD)
[Baget+ JAIR 2002]
= core chase
Finite Saturation (fes)
Def: R is a finite expansion set (fes) if for all fact F, there is F’ (finitely)
derived from F s.t. any rule application on F’ leads to a fact equivalent to
F’
R is fes iff for all fact F, there is k s.t. Fk+1 Fk
No existential
variables
Main Recognizable Classes with Finite
Saturation (fes)
Datalog
Acyclic position
dependency graph Weak-
acyclicity
aGRD Acyclic
GRD
Joint-
acyclicity
Acyclic existential
dependency graph
GRD with any
kind of fes s.c.c.
fes-
GRD
[Baget KR 04]
[Baget KR 04]
[Deutsch+ ICDT 03]
[Fagin+ ICDT 03]
[Krötzsch+ IJCAI 11]
Position dependency graph: nodes are positions in predicates
edges show how existential variables are propagated
Graph of Rule Dependencies: nodes are rules
(GRD – also chase graph) edges express that a rule may lead to trigger a rule
Super-
weak
acyclicity
[Marnette PODS 09]
Stratified
chase
graph
Acyclic chase graph
[Deutsch+ PODS 08]
[Deutsch+ PODS 08]
GRD with wa
strongly
connected
components
Semantic condition [Cuenca Grau+ KR 12]
MFA
FINITE UNIFICATION SETS
00 MOIS 2011 EMETTEUR - NOM DE LA PRESENTATION - 15
Remark: If Q1 is more general than Q2 (i.e. Q1 maps to Q2) then Q2 is useless
Def: R is a finite unification set (fus) if, for any query Q,
the set of most general rewritings of Q with R is finite
[ dropping « most general » would weaken the notion]
Finite Query Rewriting (fus)
[Baget+ DL 08, IJCAI 09]
Prop: [König+ RR 2011] When R is fus, there is a unique sound and
complete set of most general rewritings of Q
[unique if each query in the set is made non redundant]
E.g. necessary properties of
concepts / relations
Main Recognizable Classes with Finite Query
Rewriting (fus)
Atomic-
body Sticky
Domain-
restricted
Sticky-
join Each head atom contains
all or none of
the body variables
E.g. concept product
elephant(x) mouse(y) bigger-than(x,y)
[Cali+
PVLDB 2010]
[Baget+ IJCAI 09] [Baget+ IJCAI 09]
[Cali+ RR 10]
= linear Datalog+/- [Cali+ PODS 09]
Body restricted
to a single atom
Restricts multiple
occurrences
of body variables
that do not occur
in all head atoms
Each head atom contains all the
body variables
(GREEDY) BOUNDED
TREEWIDTH SETS
Width of a tree decomposition = max number of nodes in a bag (minus 1)
Treewidth of a graph = min width over all decomposition trees of this graph
r
a
p(a,b) q(b,z0) r(a,b,t0) p(b,t0) q(t0,z1) r(b,t0,t1) p(t0,t1)
Decomposition tree:
1) each node (term) appears in a bag
2) each hyperedge (atom) has all its nodes in a bag
3) for each node x, the subgraph induced by the bags containing x is connected
b
r
t0
z0 a b
node
hyper
edge
Decomposition Tree / Treewidth
p p p
q q
z1
t1
b z0 a b t0
t0 z1 b t0 t1
p(a,b)
q(b,z0)
r(a,b,t0)
p(b,t0)
r(b,t0,t1)
p(t0,t1)
q(t0,z1)
Bounded treewidth and decidability
Definition: A set R of existential rules is a bounded treewidth set (bts) if,
for any fact F, there exists a bound b such that, for any F’ derived from
F, treewidth(F’) < b
Theorem (basically [Cali+, KR 2008]): : If R is bts, then entailment is
decidable.
Proof: Direct consequence of [Courcelles90] + [Thomas88] for
compactness.
Datalog
weakly-acyclic
wa-GRD
jointly
acyclic
acyclic
GRD
w-sticky
w-sticky-join
Inclusion
dependency
atomic body
domain-r
sticky join
sticky
guarded frontier-1
frontier-
guarded
weakly-
guarded
weakly fg
jointly fg
glut fg
FES
FUS
BTS
(Partial) Inclusion Map of Decidable
Classes
An atom in the body
guards all the body
variables
Guard only the frontier
Guard only affected
variables
(i.e.possibly mapped
to new existentials)
Guard only affected variables
from the frontier Frontier: variables shared
by the body and the head
Some Recognizable bts (and not fes) Classes
of Rules
guarde
d [Cali+ KR’08]
weakly
guarded
[Cali+ KR’08] The frontier
has size 1
[Baget+ KR’10]
frontier
guarded
[Baget+ KR’10]
weakly
frontier
guarded
[Baget+ IJCAI’09]
frontier
1
r(x,y) r(y,z) s(x,y,z) r(y,u) r(z,u) r(x,y) r(y,z) r(x,z) r(z,u)
r(x,y) r(y,z)
r(y,u) r(z,u)
datalog
From BTS to GBTS
Recognizability: As fes and fus, bts is an unrecognizable
class.
Algorithms: Can we, as done for fes and fus, present a
generic algorithm for all bts subclasses?
• Problem 1: we have to be able to compute the
bound b = fC(F, R) for each of the bts subclasses.
• Problem 2: even in that case, the enumeration of
every interpretation of treewidth lesser than b (and of
sufficient depth) is not reasonable.
Solution: the GBTS (Greedy BTS) class
The GBTS class: main ideas
Goal: Ensure that we can greedily build a decomposition tree of
bounded width along the chase.
F = B0
T0 = Terms(F) {possible constants}
j(Bi) B1 = j(Bi)
T1 = Terms(B1) T0
B2 = j’(Bi’)
T2 = Terms(B2) T0
j’(Bi’
)
F1
k(Bq)
Is there a bag Bp such that k
maps terms of frontier(Rq) to Tp?
YES
B3 = k(Bq)
T3 = Terms(B3) T0
NO
The procedure halts and FAILS
The GBTS class: main ideas
Definition: A set of existential rules R is gbts if, for any fact F,
for any R-derivation sequence from F, this greedy algorithm
does not return FAILS (in that case, it effectively builds a
decomposition tree of width |F| + max(|Ri|)).
Question 1: Is gbts an « interesting subset » of bts?
Question 2: Is there a generic algorithm for gbts?
Question 3: What is its complexity?
Question 4: Is gbts recognizable?
Which known bts classes are gbts?
Datalog
weakly-
acyclic
wa-
GRD
jointly
acyclic
acyclic
GRD
w-sticky
w-sticky-join
Inclusion
dependency
atomic
body domain-r
sticky join
sticky
guarded
frontier-1
frontier-
guarded
weakly-
guarded
weakly fg
jointly fg
glut fg
FES
FUS
BTS
GBTS
Overview of the gbts algorithm
Step 1: a finite encoding of the chase
Let us suppose that R is gbts
F = B0
T0
B1
T1
B2
T2
B4
T4
B3
T3
B5
T5
B12
T12
B10
T10
B9
T9
B8
T8 B6
T6
B7
T7
B14
T14
Take Bi and Bj created
by a same rule R (resp. by i and j), s.t.
there is a bijection ij between Ti and Tj
with ij (t) = t’ iff t and t’ have been
generated from the same term of R.
B3
T3
B4
T4 3,4
If, moreover, ij determines an
isomorphism between the atoms
of the subtrees respectively
rooted in Bi and Bj
3,4
3,4
Then Bi and Bj are equivalent.
(and so are their children)
4,10
3,10 B10
T10
Consider an equivalence
class of bags.
Choose a (high) representant
B4
T4
Delete subtrees of all others! Blocked tree
Computing the blocked tree
To each bag we associate a pattern
A pattern of a bag (at step p) encodes all ways of mapping
a subset of any rule body to the fact obtained at step p,
while using some terms from this bag.
Equivalence of patterns at some step p implies that copies
(as defined before) under equivalent bags belong to the
chase.
Since there is finitely many non equivalent patterns, that
algorithm halts, and we obtain the blocked tree.
There exists an Atom-Tree
decomposition Qt of Q
Overview of the gbts algorithm
Step 2: querying a blocked tree Let us suppose that we have built a finite blocked tree
F = B0
T0
B1
T1
B2
T2
B4
T4
B3
T3
B5
T5
B12
T12
B10
T10
B9
T9
B8
T8 B6
T6
B7
T7
B14
T14
3,4 3,4 4,10
This blocked tree encodes the chase.
Suppose Q maps to the chase.
Choose a set of bags (in the chase) that supports all atoms of Q.
F = B0
T0
B3
T3
B5
T5
B10
T10
B12
T12
Q0
Q3 Q1
0
Q5 Q1
2
B14
T14
Q1
4
3
0
10
5
14
12
There exists an ATD
mapping of Qt
is Valid
A complicated algorithm to check if Q maps to R-chase(F)
For every ATD Qt of Q
• For every ATD-mapping of Qt in chase(F)
If is Valid in chase(F)
Return YES
Return NO
Problem: we don’t have the chase, only the blocked tree…
For every ATD Qt of Q
• For every ATD-mapping of Qt in the blocked tree
If is Valid in the blocked tree
Return YES
Return NO
Overview of the gbts algorithm
Step 2: querying a blocked tree
Overview of the gbts algorithm
Step 2: querying a blocked tree We have to find an ATD mapping in the blocked tree…
F = B0
T0
B1
T1
B2
T2
B4
T4
B3
T3
B12
T12
B10
T10
B8
T8 B6
T6
B14
T14
3,4 3,4 4,10
F = B0
T0
B3
T3
B5
T5
B10
T10
B12
T12
Q0
Q3 Q1
0
Q5 Q1
2
B14
T14
Q1
4
3
0
10
5
14
12
14, 8 o 14
12, 6 o 12
5, 6 o 5
Overview of the gbts algorithm
Step 2: querying a blocked tree … and prove it corresponds to a Valid one in one completion.
F = B0
T0
B1
T1
B2
T2
B4
T4
B3
T3
B10
T10
B8
T8 B6
T6
3,4 3,4 4,10
F = B0
T0
B3
T3
B5
T5
B10
T10
B12
T12
Q0
Q3 Q1
0
Q5 Q1
2
B14
T14
Q1
4
3
0
10
5
14
12
14, 8 o 14
12, 6 o 12
5, 6 o 5
Q0
Q3 Q1
0
Q5 Q1
2
Q1
4
Overview of the gbts algorithm
Step 2: querying a blocked tree
Validation is backtrack-free
Generating the required bag is done at most in b ff steps
Overall complexity of the querying step of the algorithm
• 2q bq q b ff, where b is the size of the blocked tree
• The size of b is a double exponential in F and R
Complexity results
On the recognizability of gbts
F Body Head B0
F’
R
R’
Rnew: in(x, B0), in(y, z) -> in(x, z)
Suppose that there is a derivation from (F, R) that fails.
• There is a derivation R-derivation from F to G that does not fail,
and a failing mapping from some body B to G • There is a R’-derivation from F’ to G’,
and a failing mapping from some body B to G’ • There is a failing validation of some body B to the blocked tree of (F’, R’)
If a derivation from (F, R) fails, then a derivation from (U, R) fails
ON FORWARD AND
BACKWARD CHAININGS
- 36
FORWARD & BACWARD CHAININGS
Jean-François Baget – Derivation and Rewriting Lengths - 37
F1 F2
1 F3
2 Fn-1 Fn
n-1
Qn
n
…
F1
Q2 Q1
1
1 Q3
2 Qn-1 Qn n-1 …
Theorem [Salvat 96]: the following assertions are
equivalent.
• Q is semantic consequence of F and R
• Q maps to some F’ finitely R-derived from F
• Some finite R-rewriting Q’ of Q maps to F
FINITE EXPANSION & UNIFICATION SETS
Jean-François Baget – Derivation and Rewriting Lengths - 38
INITIAL MISUNDERSTANDING
Definition [Cali & al. 2010]: A set of existential rules R has the bounded
derivation-depth property (BDDP) iff, for every facts F and Q, whenever Q can be deduced from F and R, Q is entailed by the R-saturation of F at rank ,
where = f(R, Q)
A formulation, expressed in Forward Chaining,
that is very similar to FES / BTS
BUT Known BDDP classes are FUS, not FES nor BTS !
Jean-François Baget – Derivation and Rewriting Lengths
OVERLOOKED LEMMAS [Salvat, 96]
Jean-François Baget – Derivation and Rewriting Lengths - 40
F1 F2
1
Q2
2
Q1
1
1
F1 F2
1
Q2
2
Q1
1
1
F3
2
Q3
3
2
Fn-1
Qn-1
n-1
Fn
n-1
Qn
n
n-1
F3 2
Q3
3
2
Fn-1
Qn-1
n-1
Fn
n-1
Qn
n
n-1
…
… F1 F2
1
Q2
2
Q1
1
1
F1 F2
1
Q2
2
Q1
1
1
BOUNDED DERIVATION DEPTH & FUS (1)
Jean-François Baget – Derivation and Rewriting Lengths - 41
Definition [Cali & al. 2010]: A set of existential rules R has the (Q) bounded
derivation-depth property (QBDDP) iff, for every facts F and Q, whenever Q can be deduced from F and R, Q is entailed by the R-saturation of F at rank ,
where = f(R, Q)
Property: If R has the QBDDP, then R is f.u.s. Moreover, for any Q, the depth
of the R-rewriting tree of Q is N (where N is the max size of Q and rule bodies).
Q’ Q
Q’
Q’
Q Q’’ For any Q’ rewritable from Q,
there is a Q’’ in the rewriting tree
of depth N
that is more general than Q’.
BOUNDED DERIVATION DEPTH & FUS (2)
Jean-François Baget – Derivation and Rewriting Lengths - 42
Definition [Cali & al. 2010]: A set of existential rules R has the (Q)
bounded derivation-depth property (QBDDP) iff, for every facts F and Q, whenever Q can be deduced from F and R, Q is entailed by the R-saturation
of F at rank , where = f(R, Q)
Property: If R is f.u.s., then R has the QBDDP.
Q
= f(R, Q)
Q’ Q
F
Suppose Q can be deduced from F and R
Q’
BOUNDED DERIVATION DEPTH & FES
Jean-François Baget – Derivation and Rewriting Lengths - 43
Definition [Oxford 2012]: A set of existential rules R has the (F) bounded
derivation-depth property (FBDDP) iff, for every facts F and Q, whenever Q can be deduced from F and R, Q is entailed by the R-saturation of F at
rank , where = f(R, F)
Property: R is f.e.s.iff R has the FBDDP.
() ()
F F1 F (UM) …
= f(R, F)
Q
F F
…
= f(R, F)
Q F+1
CONCLUSION (1)
Jean-François Baget – Derivation and Rewriting Lengths - 44
Alternate Definition: A set of existential rules R has the bounded
deduction length property (BDLP) iff, for every facts F and Q, whenever Q can be deduced from F and R, there exists s.t. one of the following
equivalent assertions holds: • Q is entailed by an R-derivation of F of length .
• F entails an R-rewriting of Q of length .
Moreover: • If = f(F, R), we say that R has the Fact BDLP (FBDLP)
• If = f(Q, R), we say that R has the Query BDLP (QBDLP)
CONCLUSION (2)
Jean-François Baget – Derivation and Rewriting Lengths - 45