tutorial "linked data query processing" part 2 "theoretical foundations" (www...

35
Linked Data Query Processing Tutorial at the 22nd International World Wide Web Conference (WWW 2013) May 14, 2013 http://db.uwaterloo.ca/LDQTut2013/ 2. Theoretical Foundations Olaf Hartig University of Waterloo

Upload: olaf-hartig

Post on 11-May-2015

1.488 views

Category:

Technology


0 download

DESCRIPTION

These are the slides from my WWW 2013 Tutorial "Linked Data Query Processing" http://db.uwaterloo.ca/LDQTut2013/

TRANSCRIPT

Page 1: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

Linked Data Query ProcessingTutorial at the 22nd International World Wide Web Conference (WWW 2013)

May 14, 2013

http://db.uwaterloo.ca/LDQTut2013/

2.Theoretical Foundations

Olaf HartigUniversity of Waterloo

Page 2: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 2

● Data retrieval approach● Data source selection● Data source ranking

(optional, for optimization)

Query-local data

GET http://.../movie2449

● Result construction approach● i.e., query-local data processing

● Combining data retrievaland result construction

http://mdb.../Paul http://geo.../Berlinhttp://mdb.../Ric http://geo.../Rome

?loc?actor

“Ingredients” for LD Query Execution

Page 3: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 3

Outline

Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics

➢ Definition➢ Theoretical Properties

Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties

Page 4: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 4

General Idea of SPARQL

● SPARQL: Declarative query language for RDF data

● Main idea: pattern matching● Query describes a pattern to be found in RDF data● Basically, query patterns consist of RDF triples with variables

( ?p , affiliated with , orgaX ) ,

( ?p , interested in , ?i )

Basic Query Pattern:

Page 5: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 5

General Idea of SPARQL

● SPARQL: Declarative query language for RDF data

● Main idea: pattern matching● Query describes a pattern to be found in RDF data● Basically, query patterns consist of RDF triples with variables● Any part of the queried data that matches the query pattern

contributes a solution to the query result

( ?p , affiliated with , orgaX ) ,

( ?p , interested in , ?i )

Basic Query Pattern:

( alice , affiliated with , orgaX ) ,

( bob , affiliated with , acme ) ,

( alice , interested in , yoga ) ,

( alice , interested in , tea ) ,

( bob , interested in , tea ) ,

Data:

?p ?i

alice yogaalice tea

Query Result:

Page 6: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 6

Some Terminology

● Any query result is represented by a set of valuations

Ω = { μ1 , … , μn }

● Valuation: mapping of query variables to RDF terms

μ = { ?p → alice , ?i → yoga }

● Any valuation in a query result is a solution

● Two valuations μ1 and μ2 are compatible if they agree on the binding for any shared variable● i.e., μ1(?v) = μ2(?v) for all ?v Є dom(μ1) ∩ dom(μ2)

?p ?i

alice yogaalice tea

Page 7: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 7

Query Patterns and their Semantics

● Triple pattern: An RDF triple that may have a variable as subject, predicate, or object, respectively

● eval( tp, G ) :═ { μ | μ[tp] Є G and dom(μ) = vars(tp) }● Group graph pattern: Conjunction of two query patterns

● eval( (P1 AND P2), G ) :═ { μ1 U μ2 | μ1 Є eval(P1,G) and μ2 Є eval(P2,G) and

μ1 and μ2 are compatible }● Associative and commutative

● Basic graph pattern (BGP): A set of triple patterns

● B = { tp1 , … , tpn } is equivalent to ( tp1 AND … AND tpn )

● Filter, union, optional, etc.

Page 8: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 8

Outline

Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics

➢ Definition➢ Theoretical Properties

Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties

Page 9: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 9

Data Model

● Static view (i.e., no changes during computation)

● Web of Linked Data: W = ( D, adoc, data )● D – set of symbols that represent Web documents

from which Linked Data can be extracted● adoc – partial mapping from URIs to D● data – total mapping from D to finite sets of RDF triples

● All data in W: AllData(W ) :═ Ud є D data(d)

D = adoc( ) = data( ) =

Page 10: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 10

Linked Data Query

Q(W )

Page 11: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 11

( ?p , affiliated with , orgaX ) ,

( ?p , interested in , ?i )

?p ?i

alice yogaalice tea

QP(W ) = { μ1 , μ2 , ... }

SPARQL-Based Linked Data Query

Page 12: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 12

Outline

Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics

➢ Definition➢ Theoretical Properties

Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties

Page 13: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 13

Full-Web Semantics

QP(W ) :═ eval(P,AllData(W ))

Page 14: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 14

Computability

● (Ordinary) Turing machinesare unsuitable:● Limited data access capabilities

not properly captured

● Web machines● Abiteboul and Vianu, 1997● Mendelzon and Milo, 1997

QP(W ) :═ eval(P,AllData(W ))

TM

Page 15: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 15

LD Machine

● Multi-tape Turing machine

➔ Input

➔ Work

➔ Output

Page 16: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 16

LD Machine

# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙

● Multi-tape Turing machine➔ Web Input

➔ Input

➔ Work

➔ Output

Page 17: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 17

LD Machine

# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙

● Multi-tape Turing machine➔ Web Input

➔ Input

➔ Work

➔ Output

● Access to Web Input is restricted● Only by performing

a particular procedurein a particular state

Page 18: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 18

# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙➔ Web Input

➔ Input

➔ Work

➔ Output

● For Q exists an LD machine MQ such that for any W holds:

● MQ halts after a finite number of computation steps, and

● MQ outputs the complete result Q(W )

Finitely Computable LD Queries

step 1 ∙ ∙ ∙ step k - 3 step k - 2 step k – 1 step k

∙ ∙ ∙

# enc(μ1) # enc(μ2) # ∙ ∙ ∙ # enc(μn) #

Page 19: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 19

Eventually Computable LD Queries

stepk + 2

∙ ∙ ∙∙ ∙ ∙

stepk - 3

stepk - 2

stepk - 1

stepk

stepk + 1

# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙

# enc(μ1) # enc(μ2)

➔ Web Input

➔ Input

➔ Work

➔ Output

● For Q exists an LD machine MQ such that for any W holds:

1. Output always encodes a subset of query result Q(W ), and

2. Each μ Q(W ) eventually appears on the output

✗ No guarantee for termination

Page 20: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 20

Not (LD Machine) Computable

stepk + 2

∙ ∙ ∙∙ ∙ ∙

stepk - 3

stepk - 2

stepk - 1

stepk

stepk + 1

# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙

# enc(μ1) # enc(μ2)

➔ Web Input

➔ Input

➔ Work

➔ Output

● For Q exists an LD machine MQ such that for any W holds:

1. Output always encodes a subset of query result Q(W ), and

2. Each μ Q(W ) eventually appears on the output

✗ No guarantee for termination

Page 21: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 21

Theorem [Har12]: If a satisfiable SPARQLLD query QP under full-Web semantics is monotonic, then it is eventually computable (but not finitely computable); otherwise, it is not even eventually computable.

Theorem [Har12]: If a satisfiable SPARQLLD query QP under full-Web semantics is monotonic, then it is eventually computable (but not finitely computable); otherwise, it is not even eventually computable.

● Main reason: Set of all URIs is infinite (but countable)

Computability (Full-Web Semantics)

QP(W ) :═ eval(P,AllData(W ))

Page 22: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 22

Satisfiability and Monotonicity

Q( ) = { …, ri , … } ≠ Ø

⟹ Q( ) ⊆ Q( )

Proposition [Har12]: Let QP be a SPARQLLD query under full-Websemantics.● QP is satisfiable ⟺ its query pattern P is satisfiable● QP is monotonic ⟺ its query pattern P is monotonic

Proposition [Har12]: Let QP be a SPARQLLD query under full-Websemantics.● QP is satisfiable ⟺ its query pattern P is satisfiable● QP is monotonic ⟺ its query pattern P is monotonic

Unfortunately: Satisfiability andmonotonicity of SPARQLquery patterns is undecidable

Satisfiability:

Monotonicity:

OPTtp

AND

FILTER

UNION

Page 23: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 23

Theorem [Har12]: If a satisfiable SPARQLLD query QP under full-Web semantics is monotonic, then it is eventually computable (but not finitely computable); otherwise, it is not even eventually computable.

Theorem [Har12]: If a satisfiable SPARQLLD query QP under full-Web semantics is monotonic, then it is eventually computable (but not finitely computable); otherwise, it is not even eventually computable.

● Main reason: Set of all URIs is infinite (but countable)

Computability (Full-Web Semantics)

QP(W ) :═ eval(P,AllData(W ))

Page 24: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 24

Outline

Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics

➢ Definition➢ Theoretical Properties

Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties

Page 25: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 25

Reachability-Based Query Semantics

Page 26: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 26

● Seed URIs S ᑕ U

Reachability-Based Query Semantics

Page 27: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 27

● Seed URIs S ᑕ U● Reachability criterion c

● (Turing) computable function c: T ×U × P → {true,false}

Reachability-Based Query Semantics

Page 28: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 28

WQP,S( ) := eval(P,AllData(W

* ))c

Reachability-Based Query Semantics

Page 29: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 29

(Reachability-Based) cAll-Semantics

WQP,S( ) eval(P,AllData(W

* ))cAll

:=

Page 30: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 30

WQP,S( ) eval(P,AllData(W

* ))cNone

:=

(Reachability-Based) cNone-Semantics

Page 31: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 31

WQP,S( ) eval(P,AllData(W

* ))cMatch

:=

(Reachability-Based) cMatch-Semantics

Page 32: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 32

Satisfiability and Monotonicity

Q( ) = { …, ri , … } ≠ Ø

⟹ Q( ) ⊆ Q( )

Satisfiability:

Monotonicity:

Proposition [Har12]: Let c be a reachability criterion and QPcS be a

SPARQLLD query under c-semantics.● QP

cS is satisfiable ⟺ its SPARQL expression P is satisfiable

● QPcS is monotonic ⇒ its SPARQL expression P is monotonic

Proposition [Har12]: Let c be a reachability criterion and QPcS be a

SPARQLLD query under c-semantics.● QP

cS is satisfiable ⟺ its SPARQL expression P is satisfiable

● QPcS is monotonic ⇒ its SPARQL expression P is monotonic

Counterexample: Let QPcS be a SPARQLLD query under c-semantics.

If c = cNone and | S | = 1, then QPcS is monotonic.

Counterexample: Let QPcS be a SPARQLLD query under c-semantics.

If c = cNone and | S | = 1, then QPcS is monotonic.

Page 33: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 33

LD Machine-Based Computability

Theorem [Har12]: For any reachability criterion c, any SPARQL based Linked Data query under c-semantics is finitely computable.

Theorem [Har12]: For any reachability criterion c, any SPARQL based Linked Data query under c-semantics is finitely computable.

● If we restrict our data model such that Websof Linked Data are finite (i.e., consist of a finitenumber of documents), then:

For results w.r.t. the unrestricted data model (i.e.,Webs of Linked Data that may be infinitely large),refer to [Har12].

Page 34: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 34

Outline

Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics

➢ Definition➢ Theoretical Properties

Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties

Next part: 3. Source Selection Strategies ...

Page 35: Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW 2013 Ed.)

WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 35

These slides have been created byOlaf Hartig

for theWWW 2013 tutorial on

Link Data Query Processing

Tutorial Website: http://db.uwaterloo.ca/LDQTut2013/

This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License

(http://creativecommons.org/licenses/by-sa/3.0/)