tutorial "linked data query processing" part 2 "theoretical foundations" (www...
DESCRIPTION
These are the slides from my WWW 2013 Tutorial "Linked Data Query Processing" http://db.uwaterloo.ca/LDQTut2013/TRANSCRIPT
Linked Data Query ProcessingTutorial at the 22nd International World Wide Web Conference (WWW 2013)
May 14, 2013
http://db.uwaterloo.ca/LDQTut2013/
2.Theoretical Foundations
Olaf HartigUniversity of Waterloo
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 2
● Data retrieval approach● Data source selection● Data source ranking
(optional, for optimization)
Query-local data
GET http://.../movie2449
● Result construction approach● i.e., query-local data processing
● Combining data retrievaland result construction
http://mdb.../Paul http://geo.../Berlinhttp://mdb.../Ric http://geo.../Rome
?loc?actor
“Ingredients” for LD Query Execution
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 3
Outline
Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics
➢ Definition➢ Theoretical Properties
Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 4
General Idea of SPARQL
● SPARQL: Declarative query language for RDF data
● Main idea: pattern matching● Query describes a pattern to be found in RDF data● Basically, query patterns consist of RDF triples with variables
( ?p , affiliated with , orgaX ) ,
( ?p , interested in , ?i )
Basic Query Pattern:
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 5
General Idea of SPARQL
● SPARQL: Declarative query language for RDF data
● Main idea: pattern matching● Query describes a pattern to be found in RDF data● Basically, query patterns consist of RDF triples with variables● Any part of the queried data that matches the query pattern
contributes a solution to the query result
( ?p , affiliated with , orgaX ) ,
( ?p , interested in , ?i )
Basic Query Pattern:
( alice , affiliated with , orgaX ) ,
( bob , affiliated with , acme ) ,
( alice , interested in , yoga ) ,
( alice , interested in , tea ) ,
( bob , interested in , tea ) ,
Data:
?p ?i
alice yogaalice tea
Query Result:
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 6
Some Terminology
● Any query result is represented by a set of valuations
Ω = { μ1 , … , μn }
● Valuation: mapping of query variables to RDF terms
μ = { ?p → alice , ?i → yoga }
● Any valuation in a query result is a solution
● Two valuations μ1 and μ2 are compatible if they agree on the binding for any shared variable● i.e., μ1(?v) = μ2(?v) for all ?v Є dom(μ1) ∩ dom(μ2)
?p ?i
alice yogaalice tea
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 7
Query Patterns and their Semantics
● Triple pattern: An RDF triple that may have a variable as subject, predicate, or object, respectively
● eval( tp, G ) :═ { μ | μ[tp] Є G and dom(μ) = vars(tp) }● Group graph pattern: Conjunction of two query patterns
● eval( (P1 AND P2), G ) :═ { μ1 U μ2 | μ1 Є eval(P1,G) and μ2 Є eval(P2,G) and
μ1 and μ2 are compatible }● Associative and commutative
● Basic graph pattern (BGP): A set of triple patterns
● B = { tp1 , … , tpn } is equivalent to ( tp1 AND … AND tpn )
● Filter, union, optional, etc.
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 8
Outline
Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics
➢ Definition➢ Theoretical Properties
Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties
√
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 9
Data Model
● Static view (i.e., no changes during computation)
● Web of Linked Data: W = ( D, adoc, data )● D – set of symbols that represent Web documents
from which Linked Data can be extracted● adoc – partial mapping from URIs to D● data – total mapping from D to finite sets of RDF triples
● All data in W: AllData(W ) :═ Ud є D data(d)
D = adoc( ) = data( ) =
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 10
Linked Data Query
Q(W )
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 11
( ?p , affiliated with , orgaX ) ,
( ?p , interested in , ?i )
?p ?i
alice yogaalice tea
QP(W ) = { μ1 , μ2 , ... }
SPARQL-Based Linked Data Query
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 12
Outline
Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics
➢ Definition➢ Theoretical Properties
Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties
√
√
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 13
Full-Web Semantics
QP(W ) :═ eval(P,AllData(W ))
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 14
Computability
● (Ordinary) Turing machinesare unsuitable:● Limited data access capabilities
not properly captured
● Web machines● Abiteboul and Vianu, 1997● Mendelzon and Milo, 1997
QP(W ) :═ eval(P,AllData(W ))
TM
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 15
LD Machine
● Multi-tape Turing machine
➔ Input
➔ Work
➔ Output
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 16
LD Machine
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
● Multi-tape Turing machine➔ Web Input
➔ Input
➔ Work
➔ Output
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 17
LD Machine
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
● Multi-tape Turing machine➔ Web Input
➔ Input
➔ Work
➔ Output
● Access to Web Input is restricted● Only by performing
a particular procedurein a particular state
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 18
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙➔ Web Input
➔ Input
➔ Work
➔ Output
● For Q exists an LD machine MQ such that for any W holds:
● MQ halts after a finite number of computation steps, and
● MQ outputs the complete result Q(W )
Finitely Computable LD Queries
step 1 ∙ ∙ ∙ step k - 3 step k - 2 step k – 1 step k
∙ ∙ ∙
# enc(μ1) # enc(μ2) # ∙ ∙ ∙ # enc(μn) #
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 19
Eventually Computable LD Queries
stepk + 2
∙ ∙ ∙∙ ∙ ∙
stepk - 3
stepk - 2
stepk - 1
stepk
stepk + 1
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
# enc(μ1) # enc(μ2)
➔ Web Input
➔ Input
➔ Work
➔ Output
● For Q exists an LD machine MQ such that for any W holds:
1. Output always encodes a subset of query result Q(W ), and
2. Each μ Q(W ) eventually appears on the output
✗ No guarantee for termination
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 20
Not (LD Machine) Computable
stepk + 2
∙ ∙ ∙∙ ∙ ∙
stepk - 3
stepk - 2
stepk - 1
stepk
stepk + 1
# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙
# enc(μ1) # enc(μ2)
➔ Web Input
➔ Input
➔ Work
➔ Output
● For Q exists an LD machine MQ such that for any W holds:
1. Output always encodes a subset of query result Q(W ), and
2. Each μ Q(W ) eventually appears on the output
✗ No guarantee for termination
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 21
Theorem [Har12]: If a satisfiable SPARQLLD query QP under full-Web semantics is monotonic, then it is eventually computable (but not finitely computable); otherwise, it is not even eventually computable.
Theorem [Har12]: If a satisfiable SPARQLLD query QP under full-Web semantics is monotonic, then it is eventually computable (but not finitely computable); otherwise, it is not even eventually computable.
● Main reason: Set of all URIs is infinite (but countable)
Computability (Full-Web Semantics)
QP(W ) :═ eval(P,AllData(W ))
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 22
Satisfiability and Monotonicity
Q( ) = { …, ri , … } ≠ Ø
⟹ Q( ) ⊆ Q( )
Proposition [Har12]: Let QP be a SPARQLLD query under full-Websemantics.● QP is satisfiable ⟺ its query pattern P is satisfiable● QP is monotonic ⟺ its query pattern P is monotonic
Proposition [Har12]: Let QP be a SPARQLLD query under full-Websemantics.● QP is satisfiable ⟺ its query pattern P is satisfiable● QP is monotonic ⟺ its query pattern P is monotonic
Unfortunately: Satisfiability andmonotonicity of SPARQLquery patterns is undecidable
Satisfiability:
Monotonicity:
OPTtp
AND
FILTER
UNION
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 23
Theorem [Har12]: If a satisfiable SPARQLLD query QP under full-Web semantics is monotonic, then it is eventually computable (but not finitely computable); otherwise, it is not even eventually computable.
Theorem [Har12]: If a satisfiable SPARQLLD query QP under full-Web semantics is monotonic, then it is eventually computable (but not finitely computable); otherwise, it is not even eventually computable.
● Main reason: Set of all URIs is infinite (but countable)
Computability (Full-Web Semantics)
QP(W ) :═ eval(P,AllData(W ))
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 24
Outline
Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics
➢ Definition➢ Theoretical Properties
Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties
√
√
√
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 25
Reachability-Based Query Semantics
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 26
● Seed URIs S ᑕ U
Reachability-Based Query Semantics
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 27
● Seed URIs S ᑕ U● Reachability criterion c
● (Turing) computable function c: T ×U × P → {true,false}
Reachability-Based Query Semantics
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 28
WQP,S( ) := eval(P,AllData(W
* ))c
Reachability-Based Query Semantics
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 29
(Reachability-Based) cAll-Semantics
WQP,S( ) eval(P,AllData(W
* ))cAll
:=
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 30
WQP,S( ) eval(P,AllData(W
* ))cNone
:=
(Reachability-Based) cNone-Semantics
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 31
WQP,S( ) eval(P,AllData(W
* ))cMatch
:=
(Reachability-Based) cMatch-Semantics
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 32
Satisfiability and Monotonicity
Q( ) = { …, ri , … } ≠ Ø
⟹ Q( ) ⊆ Q( )
Satisfiability:
Monotonicity:
Proposition [Har12]: Let c be a reachability criterion and QPcS be a
SPARQLLD query under c-semantics.● QP
cS is satisfiable ⟺ its SPARQL expression P is satisfiable
● QPcS is monotonic ⇒ its SPARQL expression P is monotonic
Proposition [Har12]: Let c be a reachability criterion and QPcS be a
SPARQLLD query under c-semantics.● QP
cS is satisfiable ⟺ its SPARQL expression P is satisfiable
● QPcS is monotonic ⇒ its SPARQL expression P is monotonic
Counterexample: Let QPcS be a SPARQLLD query under c-semantics.
If c = cNone and | S | = 1, then QPcS is monotonic.
Counterexample: Let QPcS be a SPARQLLD query under c-semantics.
If c = cNone and | S | = 1, then QPcS is monotonic.
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 33
LD Machine-Based Computability
Theorem [Har12]: For any reachability criterion c, any SPARQL based Linked Data query under c-semantics is finitely computable.
Theorem [Har12]: For any reachability criterion c, any SPARQL based Linked Data query under c-semantics is finitely computable.
● If we restrict our data model such that Websof Linked Data are finite (i.e., consist of a finitenumber of documents), then:
For results w.r.t. the unrestricted data model (i.e.,Webs of Linked Data that may be infinitely large),refer to [Har12].
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 34
Outline
Basics of the SPARQL Query Language Data Model for Linked Data Queries Full-Web Query Semantics
➢ Definition➢ Theoretical Properties
Reachability-Based Query Semantics➢ Definition➢ Theoretical Properties
√
√
√
√
Next part: 3. Source Selection Strategies ...
WWW 2013 Tutorial on Linked Data Query Processing [ Theoretical Foundations ] 35
These slides have been created byOlaf Hartig
for theWWW 2013 tutorial on
Link Data Query Processing
Tutorial Website: http://db.uwaterloo.ca/LDQTut2013/
This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)