andreas pieris - dtai.cs.kuleuven.bedefault negation for datalog§ andreas pieris institute of...

Post on 02-Oct-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Default Negation for Datalog§

Andreas Pieris

Institute of Information Systems, Vienna University of Technology, Austria

GTTV, Lexington, KY, USA, September 27, 2015

Goal of the Datalog§ Project

Transform Datalog from a first class database query language

to a first class language for knowledge representation

(and other applications)

But first, let say few words about the good old plain Datalog

Datalog

• Recursive database query language defined in the 1980s

• A useful framework for inductive definitions

• Simple syntax and clear semantics

• Well-understood (query answering and containment, optimisations)

• Large projects and companies are “Datalog-based”

London

Vienna

Larnaca

Glasgow

Edinburgh

Datalog

Is Glasgow reachable from Vienna?

Flight(X,Y) Reachable(X,Y)

Flight(X,Y), Reachable(Y,Z) Reachable(X,Z)

Reachable(Vienna,Glasgow) Yes()

Flight(X,Y) Reachable(X,Y)

Flight(X,Y), Reachable(Y,Z) Reachable(X,Z)

Reachable(Vienna,Glasgow) Yes()

Datalog

DATALOG = Select-Project-Join + Recursion

Recursion - FOL or SQL queries are not enough

Modeling Ontologies

DL Axiom Rule-based Representation

A u B v C A(X), B(X) C(X)

Parent u Malev Father

Modeling Ontologies

DL Axiom Rule-based Representation

A u B v C A(X), B(X) C(X)

A v 8R.B A(X), R(X,Y) B(Y)

MetalDevicev 8hasPart.Metal

Modeling Ontologies

DL Axiom Rule-based Representation

A u B v C A(X), B(X) C(X)

A v 8R.B A(X), R(X,Y) B(Y)

R v S R(X,Y) S(X,Y)

brotherOfv relativeOf

Modeling Ontologies

DL Axiom Rule-based Representation

A u B v C A(X), B(X) C(X)

A v 8R.B A(X), R(X,Y) B(Y)

R v S R(X,Y) S(X,Y)

R inv S R(X,Y) S(Y,X)

parentOf inv childOf

Modeling Ontologies

DL Axiom Rule-based Representation

A u B v C A(X), B(X) C(X)

A v 8R.B A(X), R(X,Y) B(Y)

R v S R(X,Y) S(X,Y)

R inv S R(X,Y) S(Y,X)

trans(R) R(X,Y), R(Y,Z) R(X,Z)

trans(ancestorOf)

Modeling Ontologies

DL Axiom Rule-based Representation

A u B v C A(X), B(X) C(X)

A v 8R.B A(X), R(X,Y) B(Y)

R v S R(X,Y) S(X,Y)

R inv S R(X,Y) S(Y,X)

trans(R) R(X,Y), R(Y,Z) R(X,Z)

A v 9R.B A(X) 9Y R(X,Y) R(X,Y) B(Y)

Studentv attends.Course

Modeling Ontologies

DL Axiom Rule-based Representation

A u B v C A(X), B(X) C(X)

A v 8R.B A(X), R(X,Y) B(Y)

R v S R(X,Y) S(X,Y)

R inv S R(X,Y) S(Y,X)

trans(R) R(X,Y), R(Y,Z) R(X,Z)

A v 9R.B A(X) 9Y R(X,Y) R(X,Y) B(Y)

A v 9·1R.B A(X), R(X,Y), B(Y), R(X,Z), B(Z) Y = Z

Personv 9·1hasPassport.Valid

Modeling Ontologies

DL Axiom Rule-based Representation

A u B v C A(X), B(X) C(X)

A v 8R.B A(X), R(X,Y) B(Y)

R v S R(X,Y) S(X,Y)

R inv S R(X,Y) S(Y,X)

trans(R) R(X,Y), R(Y,Z) R(X,Z)

A v 9R.B A(X) 9Y R(X,Y) R(X,Y) B(Y)

A v 9·1R.B A(X), R(X,Y), B(Y), R(X,Z), B(Z) Y = Z

A disj B A(X), B(X) ?

Student disj Professor

Modeling Ontologies using Datalog

DL Axiom Rule-based Representation

A u B v C A(X), B(X) C(X)

A v 8R.B A(X), R(X,Y) B(Y)

R v S R(X,Y) S(X,Y)

R inv S R(X,Y) S(Y,X)

trans(R) R(X,Y), R(Y,Z) R(X,Z)

A v 9R.B A(X) 9Y R(X,Y) R(X,Y) B(Y)

A v 9·1R.B A(X), R(X,Y), B(Y), R(X,Z), B(Z) Y = Z

A disj B A(X), B(X) ?

Much is possible with Datalog

Modeling Ontologies using Datalog

DL Axiom Rule-based Representation

A u B v C A(X), B(X) C(X)

A v 8R.B A(X), R(X,Y) B(Y)

R v S R(X,Y) S(X,Y)

R inv S R(X,Y) S(Y,X)

trans(R) R(X,Y), R(Y,Z) R(X,Z)

A v 9R.B A(X) 9Y R(X,Y) R(X,Y) B(Y)

A v 9·1R.B A(X), R(X,Y), B(Y), R(X,Z), B(Z) Y = Z

A disj B A(X), B(X) ?

Much is not possible with Datalog

Datalog+

• Extend Datalog by allowing in the head:

o Existential quantification (9)

o Equality atoms (=)

o Constant false (?)

…for query answering over databases

Datalog[9,=,?]

highly expressive KR language

Datalog+ vs. DLs

• Several Horn-DLs (no disjunction) can be expressed via Datalog+ rules

• But, Datalog+ rules can express more

• Higher arity predicates allow for more flexibility

o DLs have only unary and binary predicates - concepts and roles

Boss(X) supervisorOf(X,X)

siblingOf(X,Y) 9Z (parentOf(Z,X), parentOf(Z,Y))

Datalog+: Other Appications

• Data Exchange

• Data Extraction

• Conceptual Modeling (e.g., UML)

• Querying the Semantic Web (RDF graphs)

• Automated Product Configuration

Datalog+: Other Appications

• Data Exchange

• Data Extraction

• Conceptual Modeling (e.g., UML)

• Querying the Semantic Web (RDF graphs)

• Automated Product Configuration

Data Exchange

Source Schema Target Schema

S T

Σst

Σt

person(ID, Name)

employee(Name, Address)

employee(N,A) → ID person(ID,N)

person(ID,N1), person(ID,N2) → N1 = N2

Data Extraction

PRODUCT

Toshiba_Protege_cx

Dell_25416

Dell_23233

Acer_78987

PRICE

480

360

470

390

Data Extraction

PRODUCT

Toshiba_Protege_cx

Dell_25416

Dell_23233

Acer_78987

PRICE

480

360

470

390

T1 T2

Data Extraction

we need object creation...

PRODUCT

Toshiba_Protege_cx

Dell_25416

Dell_23233

Acer_78987

PRICE

480

360

470

390

T1 T2

PRODUCT

Toshiba_Protege_cx

Dell_25416

Dell_23233

Acer_78987

PRICE

480

360

470

390

Data Extraction

table(T1),

table(T2),

sameColor(T1,T2),

isNeighbourRight(T1,T2) 9T tablebox(T),

contains(T,T1), contains(T,T2)

T1 T2

Conceptual Modeling

Stock

0..1

Member

Owns

Competes

0..1

1..1

0..1

1..1

1..1

1..1

1..1

Company

Executive Person

IssuesIndex[0..1]:Str

getIndex():List

Company(X) 9Y Issues(X,Y)

Stock(X), Issues(Y,X), Issues(Z,X) Y = Z

Stock(X),Index(X,Y) Str(Y)

Stock(X), getIndex(X,Y) List(Y)

Stock(X) 9Y Issues(Y,X)

Main Reasoning Service in Datalog+

D

Σ

hD,Σi

D

database

Datalog+ Program

Query = 9X ('(X))

hD,Σi ² Query , D ̂ Σ ² Query

Datalog§

• Extend Datalog by allowing in the head:

o Existential quantification (9)

o Equality atoms (=)

o Constant false (?)

…for query answering over databases

• But, already Datalog[9] is undecidable

• Datalog[9,=,?] is syntactically restricted ! Datalog§

Datalog[9,=,?]

Main Decidability Paradigms for Datalog[9]

Finite Treewidth Sets (FTS) Finite Unification Sets (FUS)

Database expansion is tree-like

Forward chaining procedures

Backward resolution terminates

Proof-theoretic procedures

Q

D

Q

D

…but, identifying the above properties is an undecidable problem

The Main Decidable Datalog[9] Languages

LinearGuarded

Sticky

FUS

FTS

DL-LiteR EL

Linear Datalog[9]

• Linearity: there exists only one body-atom

• LOGSPACE data complexity & PSPACE-complete combined complexity

• Strictly more expressive than DL-LiteR

person(P) 9F hasFather(P,F), person(F)

[Calì, Gottlob & Lukasiewicz, JWS 2012]

DL-LiteR into Linear Datalog[9]

DL-Lite: Popular family of DLs - at the basis of the OWL 2 QL profile of OWL

DL-LiteRAxioms Linear Datalog[9]

A v B A(X) B(X)

A v 9R A(X) 9Y R(X,Y)

9R v A R(X,Y) A(X)

9R v 9P R(X,Y) 9Z P(X,Z)

A v 9R.B A(X) 9Y R(X,Y), B(Y)

R v P R(X,Y) P(X,Y)

A v :B A(X), B(X) ?

Linear Datalog[9]

• Linearity: there exists only one body-atom

• LOGSPACE data complexity & PSPACE-complete combined complexity

• Strictly more expressive than DL-LiteR

• Query answering is first-order rewritable

person(P) 9F hasFather(P,F), person(F)

[Calì, Gottlob & Lukasiewicz, JWS 2012]

First-Order Rewritability

D

ΣQ

QSQL evaluation

8D : hD, Σi ² Q , D ² QSQL

compilation

first-order query

QFO

SQL query

translation

[Calvanese, De Giacomo, Lembo, Lenzerini & Rosati, JAR 2007]

evaluated and optimized

in the usual way

Linear Datalog[9]

• Linearity: there exists only one body-atom

• LOGSPACE data complexity & PSPACE-complete combined complexity

• Strictly more expressive than DL-LiteR

• Query answering is first-order rewritable ) low data complexity

person(P) 9F hasFather(P,F), person(F)

[Calì, Gottlob & Lukasiewicz, JWS 2012]

Guarded Datalog[9]

• Guardedness: a single body-atom contains all the body-variables

• PTIME-c data complexity & 2EXPTIME-c combined complexity

• Strictly more expressive than EL

supervisorOf(S,E), employee(E) employee(S)

[Calì, Gottlob & Lukasiewicz, JWS 2012] & [Calì, Gottlob & Kifer, JAIR 2013]

EL into Guarded Datalog[9]

EL: Popular DL for biological applications - at the basis of OWL 2 EL profile

EL Axioms Guarded Datalog[9]

A v B A(X) B(X)

A u B v C A(X), B(X) C(X)

A v 9R.B A(X) 9Y (R(X,Y), B(Y))

9R.B v A R(X,Y), B(Y) A(X)

…several extensions of EL are captured by Guarded Datalog[9]

Guarded Datalog[9]

• Guardedness: a single body-atom contains all the body-variables

• PTIME-c data complexity + 2EXPTIME-c combined complexity

• Strictly more expressive than EL

• Query answering is Datalog rewritable (cannot be first-order rewritable)

supervisorOf(S,E), employee(E) employee(S)

[Calì, Gottlob & Lukasiewicz, JWS 2012] & [Calì, Gottlob & Kifer, JAIR 2013]

Datalog Rewritability

D

ΣQ

evaluation

8D : hD, Σi ² Q , D ² QDAT

compilation

Datalog query

QDAT

exploit a Datalog engine

Guarded Datalog[9]

• Guardedness: a single body-atom contains all the body-variables

• PTIME-c data complexity & 2EXPTIME-c combined complexity

• Strictly more expressive than EL

• Query answering is Datalog rewritable ) low data complexity

supervisorOf(S,E), employee(E) employee(S)

[Calì, Gottlob & Lukasiewicz, JWS 2012] & [Calì, Gottlob & Kifer, JAIR 2013]

The Main Decidable Datalog[9] Languages

LinearGuarded

FUS

FTS

ELDL-LiteR

Sticky

Why Beyond Tree-like Models?

elephant(X) 9Y hasEAncestor(X,Y), elephant(Y)

cat(X) 9Y hasCAncestor(X,Y), cat(Y)

elephant(X), cat(Y) biggerThan(X,Y)

elephant(e)

elephant(e1)

elephant(e2)

elephant(e3)

elephant(e4)

.

.

.

cat(c)

cat(c1)

cat(c2)

cat(c3)

cat(c4)

.

.

.

£ infinite complete

bipartite graph=

• Stickiness: join-variables stick to the inferred atoms

• LOGSPACE data complexity & EXPTIME-complete combined complexity

• Strictly more expressive than DL-LiteR

• Query answering is first-order rewritable ) low data complexity

Sticky Datalog[9]

[Calì, Gottlob & P., AIJ 2012]

R(X,Y), P(Y,Z) 9W T(X,Y,W)

T(X,Y,Z) 9W S(Y,W)

R(X,Y), P(Y,Z) 9W T(X,Y,W)

T(X,Y,Z) 9W S(X,W)

The Main Decidable Datalog[9] Languages

LinearGuarded ELDL-LiteR

Sticky

Several Interesting Extensions

Field of intense research - e.g., Montpellier,

Dresden, Calabria, Oxford, Vienna, …

Linear

Guarded

Weakly-Guarded Frontier-Guarded

Weakly-Frontier-Guarded

Sticky

Sticky-Join Weakly-Sticky

Weakly-Sticky-Join

Complexity of the Main Datalog[9] Languages

Data Complexity Bounded Arity Combined Complexity

Linear in AC0 NP-c PSPACE-c

Guarded PTIME-c EXPTIME-c 2EXPTIME-c

Sticky in AC0 NP-c EXPTIME-c

via query rewriting

…can we go beyond positive rules, i.e., Datalog[9,»]?

Datalog[9,»]

• Rules extended with negative literals in their body

• But, what is the semantics for Datalog[9,»]?

Number(X) 9Y Succ(X,Y), Number(Y)

Number(X), »Even(X) Odd(X)

Number(X), »Odd(X) Even(X)

Well-Founded Semantics (WFS) & Stable Model Semantics (SMS)

Semantics of Datalog[9,»]

1. Convert the Datalog[9,»] program into a normal LP (via Skolemization)

2. Use the existing WFS and SMS for normal LPs

WFS(D,Σ) := WFS(ΠD,Σ)

SMS(D,Σ) := SMS(ΠD,Σ)

D = {R(a,b), P(a)}

Σ = {R(X,Y) 9Z R(Y,Z)),

R(X,Y), P(X), »S(X) P(Y)),

R(X,Y), »P(X) S(Y)}

ΠD,Σ = {R(a,b), P(a),

R(X,Y) R(Y,f(X,Y)),

R(X,Y), P(X), »S(X) P(Y),

R(X,Y), »P(X) S(Y)}

Skolemization

Query Answering and Datalog[9,»]

WFS Boolean Conjunctive Query Answering (WFS-BCQ) :

Input: database D, Datalog[9,»] program Σ, BCQ Q

Question: WFS(D,Σ) ² Q?

SMS Boolean Conjunctive Query Answering (SMS-BCQ) :

Input: database D, Datalog[9,»] program Σ, BCQ Q

Question: Μ ² Q, 8Μ 2 SMS(D,Σ)?

Guarded Datalog[9,»]

tree-likeness of the underlying models is preserved

R(X,Y,Z), P(X,Y), »S(Z,X) 9W R(Y,Z,W), S(W,Z)

[Gottlob, Hernich, Kupke & Lukasiewicz, PODS 2013, KR 2014]

Data Combined

WFS-BCQ PTIME-c 2EXPTIME-c

SMS-BCQ coNP-c 2EXPTIME-c

Guarded Datalog[9,»]

[Gottlob, Hernich, Kupke & Lukasiewicz, PODS 2013, KR 2014]

Q

D

blocking technique

R(X,Y,Z), P(X,Y), »S(Z,X) 9W R(Y,Z,W), S(W,Z)

Data Combined

WFS-BCQ PTIME-c 2EXPTIME-c

SMS-BCQ coNP-c 2EXPTIME-c

Guarded Datalog[9,»]

[Gottlob, Hernich, Kupke & Lukasiewicz, PODS 2013, KR 2014]

Guarded Datalog[9,»]

under SMS

Guarded Datalog[9,» ,_]

with stratified negation

·p

R(X,Y,Z), P(X,Y), »S(Z,X) 9W R(Y,Z,W), S(W,Z)

Data Combined

WFS-BCQ PTIME-c 2EXPTIME-c

SMS-BCQ coNP-c 2EXPTIME-c

Linear Datalog[9,»]

[Gottlob, Hernich, Kupke & Lukasiewicz, PODS 2013, KR 2014]

R(X,Y,Z), »S(Z,X) 9W R(Y,Z,W), S(W,Z)

Data Combined

WFS-BCQ PTIME-c 2EXPTIME-c

SMS-BCQ coNP-c 2EXPTIME-c

(LOGSPACE) (PSPACE-C)

Linear Datalog[9,»] behaves like Guarded Datalog[9,»]

Sticky Datalog[9,»]

[Alviano & P., PODS 2015]

• Stickiness: join-variables stick to the inferred atoms

• What is the right definition for Sticky Datalog[9,»]?

…either consider or ignore the variables in negative literals

R(X,Y), P(Y,Z) 9W T(X,Y,W)

T(X,Y,Z) 9W S(Y,W)

R(X,Y), P(Y,Z) 9W T(X,Y,W)

T(X,Y,Z) 9W S(X,W)

Sticky Sticky+

WFS-BCQ EXPTIME-c / in PTIME Undecidable

SMS-BCQ Undecidable Undecidable

variables in negative literals

do not obey the stickiness condition

Sticky Datalog[9,»]

combined / data

[Alviano & P., PODS 2015]

Sticky Sticky+

WFS-BCQ EXPTIME-c / in PTIME Undecidable

SMS-BCQ Undecidable Undecidable

variables in negative literals

do not obey the stickiness condition

employ a proof-theoretic approach

Sticky Datalog[9,»]

combined / data

Q

D

[Alviano & P., PODS 2015]

Sticky Sticky+

WFS-BCQ EXPTIME-c / in PTIME Undecidable

SMS-BCQ Undecidable Undecidable

variables in negative literals

do not obey the stickiness condition

combined / data

[Alviano & P., PODS 2015]

existential quantification + cartesian products + guessing

Sticky Datalog[9,»]

Sticky Datalog[9,»] is Undecidable under SMS

• Each stable model encodes a possible computation of the Turing machine

• The query checks whether at least one stable model represents a valid

halting computation

k-th horizontal row represents the

k-th configuration of the Turing machine

… … …

[Alviano & P., PODS 2015]

Sticky Sticky+

WFS-BCQ EXPTIME-c / in PTIME Undecidable

SMS-BCQ Undecidable Undecidable

variables in negative literals

do not obey the stickiness condition

Sticky Datalog[9,»]: Sum Up

combined / data

[Alviano & P., PODS 2015]

Stickiness + WFS - proof-theoretic approach

Stickiness + SMS - 9-quantification + cartesian products + guessing

But…

move(X,Y), »win(Y) win(X)

• Even rules with exactly one positive atom may not be sticky

• Can we do better?

…the second dimension of stickiness

[Alviano & P., PODS 2015]

1st Dimension: the positive part is sticky

2nd Dimension: negative literals that lose a variable stick to one positive atom

2D-Stickiness

move(X,Y), »win(Y) win(X)

P(X,Y), R(Y), »R(X) 9Z S(Y,Z)

1st dimension

2nd dimension

1st Dimension: the positive part is sticky

2nd Dimension: negative literals that lose a variable stick to one positive atom

2D-Stickiness

T(X,Y), R(Y,Z) P(X,Y)

P(X,Y), R(Y,Z), »R(X,X) 9Z S(Y,Z)

1st Dimension: the positive part is sticky

2nd Dimension: negative literals that lose a variable stick to one positive atom

2D-Stickiness

T(X,Y), R(Y,Z) P(X,Y)

P(X,Y), R(Y,Z), »R(X,X) 9Z S(Y,Z)

1st dimension

1st Dimension: the positive part is sticky

2nd Dimension: negative literals that lose a variable stick to one positive atom

2D-Stickiness

T(X,Y), R(Y,Z) P(X,Y)

P(X,Y), R(Y,Z), »R(X,X) 9Z S(Y,Z)

1st dimension

2nd dimension

1st Dimension: the positive part is sticky

2nd Dimension: negative literals that lose a variable stick to one positive atom

2D-Stickiness

P(X), P(Y) T(Y,X)

T(X,Y), »R(Y,X) 9Z S(Z)

1st dimension

1st Dimension: the positive part is sticky

2nd Dimension: negative literals that lose a variable stick to one positive atom

2D-Stickiness

P(X), P(Y) T(Y,X)

T(X,Y), »R(Y,X) 9Z S(Z)

1st dimension

2nd dimension

2D-Sticky Datalog[9,»]

2D-Sticky 2D-Sticky+2

WFS-BCQ 2EXPTIME-c / PTIME-c Undecidable

SMS-BCQ Undecidable Undecidable

negative literals may stick

to two positive atoms

combined / data

[Alviano & P., PODS 2015]

2D-Sticky Datalog[9,»]

2D-Sticky 2D-Sticky+2

WFS-BCQ 2EXPTIME-c / PTIME-c Undecidable

SMS-BCQ Undecidable Undecidable

negative literals may stick

to two positive atoms

combined / data

employ a proof-theoretic approach

Q

D

[Alviano & P., PODS 2015]

Datalog[9,»]: An Overview

LinearGuarded

Sticky

2D-Sticky

Conclusions and Future Work

Thank you!

Transform Datalog from a first class database query language to a

first class language for knowledge representation

(and other applications)

Problems under investigation:

• Stickiness + stable model semantics

• Deal with equality - Datalog[9,» ,=]

• New semantics without applying Skolemization - follow the

approach on stable models by Ferraris, Lee & Lifschitz

top related