rdf and sparql: database foundationspolleres/sparqltutorial/eswc... · 2013-09-23 · m. arenas, c....

52
RDF and SPARQL: Database Foundations Marcelo Arenas, Claudio Gutierrez, Jorge P´ erez Department of Computer Science Pontificia Universidad Cat´ olica de Chile Universidad de Chile Center for Web Research http://www.cwr.cl/ M. Arenas, C. Gutierrez, J. P´ erez RDF and SPARQL: DB Foundations 1 / 52

Upload: others

Post on 25-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

RDF and SPARQL: Database Foundations

Marcelo Arenas, Claudio Gutierrez, Jorge Perez

Department of Computer SciencePontificia Universidad Catolica de Chile

Universidad de Chile

Center for Web Researchhttp://www.cwr.cl/

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 1 / 52

Page 2: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Outline

I Part I: Querying RDF Data

I The RDF data modelI Querying: The simple and the idealI Querying: Semantics and Complexity

I Part II: Querying Data with SPARQL

I Decisions takenI Decisions to be taken

I Conclusions

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 2 / 52

Page 3: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

RDF in a nutshell

I RDF is the W3C proposal framework for representinginformation in the Web.

I Abstract syntax based on directed labeled graph.

I Schema definition language (RDFS): Define new vocabulary(typing, inheritance of classes and properties).

I Extensible URI-based vocabulary.

I Support use of XML schema datatypes.

I Formal semantics.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 3 / 52

Page 4: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

RDF formal model

Subject ObjectPredicate

LB

U

U UB

U = set of Uris

B = set of Blank nodes

L = set of Literals

(s, p, o) ∈ (U ∪ B)× U × (U ∪ B ∪ L) is called an RDF triple

A set of RDF triples is called an RDF graph

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 4 / 52

Page 5: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

RDFS: An example

works in

rdf:sp

person

sportman

soccer player

rdf:sc

rdf:sc

rdf:dom

rdf:dom rdf:range

rdf:range

Ronaldinho Barcelona

plays in

plays in

soccer team

company

rdf:typerdf:type

rdf:sc

Spainlives in

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 5 / 52

Page 6: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

RDF model

Some difficulties:

I Existential variables as datavalues

I Built-in vocabulary with fixed semantics (RDFS)

I Graph model where nodes may also be edge labels

RDF data processing can take advantage of database techniques:

I Query processing

I Storing

I Indexing

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 6 / 52

Page 7: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Entailment of RDF graphs

Entailment of RDF graphs:

I Can be defined in terms of classical notions such model,interpretation, etc

I As for the case of first order logic

I Has a graph characterization via homomorphisms.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 7 / 52

Page 8: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Homomorphism

A function h : U ∪ B ∪ L→ U ∪ B ∪ L is a homomorphism h fromG1 to G2 if:

I h(c) = c for every c ∈ U ∪ L;

I for every (a, b, c) ∈ G1, (h(a), h(b), h(c)) ∈ G2

Notation: G1 → G2

Example: h = {B 7→ b}

a

b

Bp

p

a

b

p

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 8 / 52

Page 9: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Entailment

Theorem (CM77)

G1 |= G2 if and only if there is a homomorphism G2 → G1.

a

b

p

a

b

Bp

p|=

Complexity

Entailment for RDF is NP-complete

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 9 / 52

Page 10: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Graphs with RDFS vocabulary

Previous characterization of entailment is not enough to deal withRDFS vocabulary: (Ronaldinho, rdf : type, person)

works in

rdf:sp

person

sportman

soccer player

rdf:sc

rdf:sc

rdf:dom

rdf:dom rdf:range

rdf:range

Barcelona

plays in

plays in

soccer team

company

rdf:typerdf:type

rdf:sc

Spainlives in

Ronaldinho

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 10 / 52

Page 11: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Graphs with RDFS vocabulary

Built-in predicates have pre-defined semantics:

rdf:sc: transitive

rdf:sp: transitive

More complicated interactions:(p, rdf:dom, c) (a, p, b)

(a, rdf:type, c)

RDFS-entailment can be characterized by a set of rules

I An Existential rule

I Subproperty rules

I Subclass rules

I Typing rules

I Implicit typing

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 11 / 52

Page 12: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Graphs with RDFS vocabulary: Inference rules

Inference system in [MPG07] has 14 rules:

Existential rule :G1

G2if G2 → G1

Subproperty rules :(p, rdf:sp, q) (a, p, b)

(a, q, b)

Subclass rules :(a, rdf:sc, b) (b, rdf:sc, c)

(a, rdf:sc, c)

Typing rules :(p, rdf:dom, c) (a, p, b)

(a, rdf:type, c)

Implicit typing :(q, rdf:dom, a) (p, rdf:sp, q) (b, p, c)

(b, rdf:type, a)

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 12 / 52

Page 13: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

RDFS Entailment

Theorem (H04,GHM04,MPG07)

G1 |= G2 iff there is a proof of G2 from G1 using the system of 14inference rules.

Complexity

RDFS-entailment is NP-complete.

Proof idea

Membership in NP: If G1 |= G2, then there exists a polynomial-sizeproof of this fact.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 13 / 52

Page 14: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Closure of an RDF Graph

Notation:

ground(G ) : Graph obtained by replacing every blank Bin G by a constant cB .

ground−1(G ) : Graph obtained by replacing every constantcB in G by B .

Closure of an RDF graph G (denoted by closure(G )):

G ∪ {t ∈ (U ∪ B)× U × (U ∪ B ∪ L) |

there exists a ground tuple t ′ such that

ground(G ) |= t ′ and t = ground−1(t ′)}

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 14 / 52

Page 15: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Closure of an RDF Graph: Example

rdf : sc

rdf : sc

b

a

c

rdf : sc

rdf : sc

rdf : sc

b

a

c

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 15 / 52

Page 16: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Closure of an RDF graph: complexity

Proposition (H04,GHM04,MPG07)

G1 |= G2 iff G2 → closure(G1)

Complexity

The closure of G can be computed in time O(|G |4 · log |G |).

Can the closure be used in practice?

I Can we use an alternative materialization?

I Can we materialize a small part of the closure?

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 16 / 52

Page 17: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Core of an RDF Graph

An RDF Graph G is a core if there is no homomorphism from G toa proper subgraph of it.

Theorem (HN92,FKP03,GHM04)

I Each RDF graph G has a unique core (denoted by core(G )).

I Deciding if G is a core is coNP-complete.

I Deciding if G = core(G ′) is DP-complete.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 17 / 52

Page 18: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Core and RDFS

For RDF graphs with RDFS vocabulary, the core of G may containredundant information:

d

rdf : sc

c

b

a

rdf : sc

rdf : sc

B

rdf : sc

rdf : sc

d

rdf : sc

c

b

a

rdf : sc

rdf : sc

rdf : sc

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 18 / 52

Page 19: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

A normal form for RDF graphs

To reduce the size of the materialization, we can combine bothcore and closure.

I nf(G ) = core(closure(G ))

Theorem (GHM04)

I G1 is equivalent to G2 iff nf(G1) ∼= nf(G2).

I G1 |= G2 iff G2 → nf(G1)

Complexity

The problem of deciding if G1 = nf(G2) is DP-complete.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 19 / 52

Page 20: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying RDF data: Desiderata

Let D be a database, Q a query, and Q(D) the answer.

I Outputs should belong to the same family of objects as inputs

I If D ≡ D ′, then Q(D) = Q(D ′)(Weaker) If D ≡ D ′, then Q(D) ∼= Q(D ′)

I Q(D) should have no (or minimal) redundancies

I The framework should be extensible to RDFS(Should the framework be extensible to OWL?)

I Incorporate to the framework the notion of entailment

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 20 / 52

Page 21: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying RDF data: Desiderata

Outputs should belong to the same family of objects as inputs

I Allows compositionality of queries

I Allows defining views

I Allows rewriting

In RDF, the natural objects of input/output are RDF graphs.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 21 / 52

Page 22: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying RDF data: Desiderata

If D ≡ D ′, then Q(D) = Q(D ′)(Weaker) If D ≡ D ′, then Q(D) ∼= Q(D ′)

I Outputs are syntactic or semantic objects?

I Need a notion of “equivalent” databases (≡)(In RDF, there is a standard notion of logical equivalence)

I One could just ask logical equivalence in the output

I In RDF there is an intermediate notion: graph isomorphism

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 22 / 52

Page 23: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying RDF data: Desiderata

Q(D) should have no (or minimal) redundancies

I Desirable to avoid inconsistencies

I Desirable to improve processing time and space

I Standard requirement for exchange information

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 23 / 52

Page 24: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying RDF data: Desiderata

The framework should be extensible to RDFS(Should the framework be extensible to OWL?)

I A basic requirement of the Semantic Web Architecture

I Extension to OWL are not trivial because of the knownmismatch

I Not necessarily related to the type of semantics given (logicalframework, graph matching, etc.)

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 24 / 52

Page 25: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying RDF data: Desiderata

Incorporate to the framework the notion of entailment

I RDF graphs are not purely syntactic objects

I Would like to incorporate KB framework

I Beware of the complexity issues! RDF navigates on the Web

I Find the good compromise

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 25 / 52

Page 26: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying RDF data: Definitions

A conjunctive query Q is a pair of RDF graphs H,B where someresources have been replaced by variables X , Y in V .

Q : H(X )← B(X , Y )

Issues:

I Free variables in B (projection)

I Treatment of blank nodes in B

I Treatment of blank nodes in H

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 26 / 52

Page 27: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying RDF data: Definitions (cont.)

A valuation is a function v : V → U ∪ B ∪ L

A matching of a graph B in the database D is a valuation v suchthat v(B) ⊆ D.

A pre-answer to Q over D is the set

preans(Q,D) = {v(H) : v is a matching of B in D }

A single answer is an element of preans(Q,D)

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 27 / 52

Page 28: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying RDF data: Two semantics

Union: answer Q(D) is the union of all single answers

ansU(Q,D) =⋃

preans(Q,D)

Merge: answer Q(D) is the merge of all single answers

ansM(Q,D) =⊎

preans(Q,D)

Proposition

1. For both semantics, if D |= D ′ then ans(Q,D ′) |= ans(Q,D)

2. For all D, ansU(Q,D) |= ansM(Q,D)

3. With merge semantics, we cannot represent the identity query

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 28 / 52

Page 29: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying RDF data: refined semantics

Problem

Two non-isomorphic datasets D,D ′ give different answers to thesame query.

A slightly refined semantics:

1. Normalize D before querying

2. Then query as usual over nf (D)

Good News: if D ≡ D ′ then Q(D) ∼= Q(D ′)Bad News: computing nf (D) is hard

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 29 / 52

Page 30: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying RDF data: refined semantics (cont.)

The news as formal results:

Theorem (MPG07)

Do not need to compute the normal form.

Theorem (FG06)

If a query language has the following two properties:

1. for all Q, if D ≡ D ′ then Q(D) = Q(D ′),

2. can represent the identity query,

then the complexity of evaluation is NP-hard (in data complexity).

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 30 / 52

Page 31: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying RDF data: Containment

A query Q contains a query Q, denoted Q v Q ′ iff ans(Q,D)comprises all the information of ans(Q ′,D).

In classical DB: ans(Q,D) ⊆ ans(Q ′,D)

In our setting we have two versions:

I ans(Q ′,D) ⊆ ans(Q,D) (Q vp Q ′)

I preans(Q,D) ⊆ preans(Q ′,D) (modulo iso) (Q vm Q ′)

For ground RDF both notions coincide.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 31 / 52

Page 32: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying RDF data: Complexity

Query complexity version: The evaluation problem is NP-complete

Data complexity version: The evaluation problem is polynomial

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 32 / 52

Page 33: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Querying with SPARQL

I SPARQL is the W3C candidate recommendation querylanguage for RDF.

I SPARQL is a graph-matching query language.

I A SPARQL query consists of three parts:I Pattern matching: optional, union, nesting, filtering.I Solution modifiers: projection, distinct, order, limit, offset.I Output part: construction of new triples, . . ..

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 33 / 52

Page 34: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Recall the formalization from Unit-2

Syntax:

I Triple patterns: RDF triple + variables (no bnodes)

I Operators between triple patterns: AND, UNION, OPT.

I Filtering of solutions: FILTER.

I A full parenthesized algebra.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 34 / 52

Page 35: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Recall the formalization from Unit-2

Semantics:

I Based on mappings, partial functions from variables to terms.

I A mapping µ is a solution of triple pattern t in G iffI µ(t) ∈ GI dom(µ) = var(t).

I [[t]]G is the evaluation of t in G , the set of solutions.

Example

G t [[t]]G

(R1, name, john)(R1, email, [email protected])(R2, name, paul)

(?X , name, ?Y )?X ?Y

µ1: R1 johnµ2: R2 paul

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 35 / 52

Page 36: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Compatible mappings

Definition

Two mappings are compatible if they agree in their sharedvariables.

Example

?X ?Y ?Z ?Vµ1 : R1 johnµ2 : R1 [email protected]µ3 : [email protected] R2

µ1 ∪ µ2 : R1 john [email protected]µ1 ∪ µ3 : R1 john [email protected] R2

I µ2 and µ3 are not compatible

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 36 / 52

Page 37: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Sets of mappings and operations

Let M1 and M2 be sets of mappings:

Definition

Join: M1 M2

I extending mappings in M1 with compatible mappings in M2

Difference: M1 r M2

I mappings in M1 that cannot be extended with mappings in M2

Union: M1 ∪M2

I mappings in M1 plus mappings in M2 (set theoretical union)

Definition

Left Outer Join: M1 M2 = (M1 M2) ∪ (M1 r M2)

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 37 / 52

Page 38: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Semantics of general graph patterns

Definition

Given a graph G the evaluation of a pattern is recursively defined

I [[(P1 AND P2)]]G = [[P1]]G [[P2]]G

I [[(P1 UNION P2)]]G = [[P1]]G ∪ [[P2]]G

I [[(P1 OPT P2)]]G = [[P1]]G [[P2]]G

I [[(P FILTER R)]]G = {µ ∈ [[P ]]G | µ satisfies R}

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 38 / 52

Page 39: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Differences with Relational Algebra / SQL

I Not a fixed output schemaI mappings instead of tablesI schema is implicit in the domain of mappings

I Too many NULLsI mappings with disjoint domains can be joinedI mappings with distinct domains in output solutions

I SPARQL-to-SQL translations experience this issuesI need of IS NULL/IS NOT NULL in join/outerjoin conditionsI need of COALESCE in constructing output schema

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 39 / 52

Page 40: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

SPARQL complexity: the evaluation problem

Input:

A mapping µ, a graph pattern P , and an RDF graph G .

Question:

Is the mapping in the evaluation of the pattern against the graph?

µ ∈ [[P ]]G?

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 40 / 52

Page 41: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Evaluation of AND-FILTER patterns is polynomial.

Theorem (PAG06)

For patterns using only AND and FILTER operators, the evaluationproblem is polynomial:

O(|P | × |G |).

Proof ideaI Check that the mapping makes every triple to match.

I Then check that the mapping satisfies the FILTERs.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 41 / 52

Page 42: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Evaluation including UNION is NP-complete.

Theorem (PAG06)

For patterns using AND, FILTER and UNION operators, theevaluation problem is NP-complete.

Proof ideaI Reduction from 3SAT.

I A pattern encodes the propositional formula.

I ¬ bound is used to encode negation.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 42 / 52

Page 43: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Evaluation including OPT is PSPACE-complete.

Theorem (PAG06)

For patterns using AND, FILTER and OPT operators, theevaluation problem is PSPACE-complete.

Proof ideaI Reduction from QBF

I A pattern encodes a quantified propositional formula:

∀x1∃y1∀x2∃y2 · · ·ψ.

I nested OPTs are used to encode quantifier alternation.

(This time, we do not need ¬ bound.)

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 43 / 52

Page 44: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

PSPACE-hardness: A closer look

Assume ϕ = ∀x1∃y1 ψ, where ψ = (x1 ∨ ¬y1) ∧ (¬x1 ∨ y1).

We generate G , Pϕ and µ0 such that µ0 belongs to the answer ofPϕ over G iff ϕ is valid:

G : {(a, tv, 0), (a, tv, 1), (a, false, 0), (a, true, 1)}

Pψ : ((a, tv, ?X1) AND (a, tv, ?Y1)) FILTER((?X1 = 1 ∨ ?Y1 = 0) ∧ (?X1 = 0 ∨ ?Y1 = 1))

Pϕ : (a, true, ?B0) OPT (P1 OPT (Q1 AND Pψ))

µ0 : {?B0 7→ 1}

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 44 / 52

Page 45: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

PSPACE-hardness: A closer look

Pϕ : (a, true, ?B0) OPT (P1 OPT (Q1 AND Pψ))P1 : (a, tv, ?X1)Q1 : (a, tv, ?X1) AND (a, tv, ?Y1) AND (a, false, ?B0)

P1

?B0 7→ 1

?X1 7→ 0 ?Y1 7→ i ?B0 7→ 0

?X1 7→ 1 ?Y1 7→ j ?B0 7→ 0

Q1

?X1 7→ 0

?X1 7→ 1

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 45 / 52

Page 46: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Data–complexity is polynomial

Theorem (PAG06)

When patterns are consider to be fixed (data complexity), theevaluation problem is in LOGSPACE.

Proof idea

From data–complexity of first–order logic.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 46 / 52

Page 47: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

SPARQL reordering/optimization: a simple normal from

I AND and UNION are commutative and associative.

I AND, OPT, and FILTER distribute over UNION.

Theorem (UNION Normal Form)

Every graph pattern is equivalent to one of the form

P1 UNION P2 UNION · · · UNION Pn

where each Pi is UNION–free.

We concentrate in UNION-free patterns.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 47 / 52

Page 48: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Well–designed patterns

Definition

A graph pattern is well–designed iff for every OPT in the pattern

( · · · · · · · · · · · · ( A OPT B ) · · · · · · · · · · · · )↑ ↑ ↑ ↑

if a variable occurs inside B and anywhere outside the OPT, thenthe variable must also occur inside A.

Example( (

(?Y , name, paul) OPT (?X , email, ?Z ))

AND (?X , name, john))

�� ↑ ↑

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 48 / 52

Page 49: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Well–designed patterns and PSPACE-hardness

In the PSPACE-hardness reduction we use this formula:

Pϕ : (a, true, ?B0) OPT (P1 OPT (Q1 AND Pψ))P1 : (a, tv, ?X1)Q1 : (a, tv, ?X1) AND (a, tv, ?Y1) AND (a, false, ?B0)

It is not well-designed: B0

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 49 / 52

Page 50: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Well–designed patterns: reordering/optimization

For well-designed patterns

I P1 AND (P2 OPT P3) ≡ (P1 AND P2) OPT P3

I (P1 OPT P2) OPT P3 ≡ (P1 OPT P3) OPT P2

Theorem (OPT Normal Form)

Every well–designed pattern is equivalent to one of the form

( · · · ( t1 AND · · · AND tk ) OPT O1 ) · · · ) OPT On )

where each ti is a triple pattern, and each Oj is a pattern of thesame form.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 50 / 52

Page 51: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

Final remarks

I RDFS can be considered a new data model.I It is the W3C’s recommendation for describing Web metadata.

I RDFS can definitely benefit from database technology.I RDFS: Formal semantics, entailment of RDFS graphs, normal

forms for RDFS graphs (closure and core).I SPARQL: Formal semantics, complexity of query evaluation,

query optimization.I UpdatingI . . .

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 51 / 52

Page 52: RDF and SPARQL: Database Foundationspolleres/sparqltutorial/ESWC... · 2013-09-23 · M. Arenas, C. Gutierrez, J. P´erez – RDF and SPARQL: DB Foundations 6 / 52. Entailment of

References

I A. Chandra, P. Merlin, Optimal Implementation ofConjunctive Queries in Relational Databases. In STOC 1977.

I R. Fagin, P. Kolaitis, L. Popa, Data Exchange: Getting to theCore. In PODS 2003.

I C. Gutierrez, C. Hurtado, A. O. Mendelzon, Foundations ofSemantic Web Databases. In PODS 2004.

I P. Hayes, RDF Semantics. W3C Recommendation 2004.

I P. Hell, J. Nesetril, The Core of a Graph. DiscreteMathematics 1992.

I S. Munoz, J. Perez, C. Gutierrez, Minimal Deductive Systemsfor RDF. In ESWC 2007.

I J. Perez, M. Arenas, C. Gutierrez, Semantics and Complexityof SPARQL. In ISWC 2006.

M. Arenas, C. Gutierrez, J. Perez – RDF and SPARQL: DB Foundations 52 / 52