keynote at aimwd
DESCRIPTION
Keynote given at the workshop for Artificial Intelligence meets the Web of Data on Pragmatic Semantics. In this keynote I argue that the Web of Data is a Complex System or Marketplace of Ideas rather than a classical Database, and that the model theory on which classical semantics are based is not appropriate in all situations, and propose an alternative "Pragmatic Semantics" based on optimisation of possible interpretations. .TRANSCRIPT
![Page 1: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/1.jpg)
Pragmatic Semantics for the Web of Data
AImWD -- Montpellier 2013Stefan Schlobach
(based on work of and using slides from Christophe Gueret, Kathrin Denthler and Wouter Beek)
VU Amsterdam
![Page 2: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/2.jpg)
Postulates
• The Web of Data requires semantics• The Web of Data is not a database• The Web of Data is a complex system• Semantics for a database are not (always)
suitable for complex systems• We need new semantic paradigms
– Voila: Pragmatic Semantics
![Page 3: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/3.jpg)
CLASSICAL SEMANTICS FOR THE WEB OF DATA
Part1
![Page 4: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/4.jpg)
4/18
Linked DataGraph/facts based knowledge representation
Connect resources to properties / other resources
Web-based: resources have a URI
Try http://dbpedia.org/resource/Amsterdam
![Page 5: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/5.jpg)
Model theory for Semantic Web Languages: RDF, RDFS, OWL
• Ontology and Data: set of formulas S• Model: formal structure satisfying all formulas
in S• Entailment: formula f entailed by S iff f in true
in all models of S• If contradiction, no models… • No models, everything is entailed.
![Page 6: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/6.jpg)
THE WEB OF DATA AS A COMPLEX SYSTEM
Part2
![Page 7: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/7.jpg)
![Page 8: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/8.jpg)
![Page 9: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/9.jpg)
![Page 10: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/10.jpg)
![Page 11: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/11.jpg)
![Page 12: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/12.jpg)
Since 2006, people are creating linked data
But publication and interpretation are distributed processes.
The Web of Data is a Complex System.Not a database.
It is a Marketplace of ideas.
![Page 13: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/13.jpg)
13/27
Key observations
The Web of Data is more than the sum of its triples – it's a Complex System
Different actors
Different scales
Dynamic
![Page 14: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/14.jpg)
October 2007
![Page 15: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/15.jpg)
Evolution of the Web of Data
Now
![Page 16: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/16.jpg)
The WoD is a complex system!• Countless extremely heterogeneous datasets
o general-purposed datasets, such as DBpediao domain-oriented datasets, such as Bio2RDFo government data, music data, geological data, social
network data, etc. Hundrets of billions of RDF triples
o Billions of links within the datasetso More than Million links between the datasets
Embedded rich semantics in the datao data points are typedo links are typedo links is what makes the statements useful
Information has impact on different scales
![Page 17: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/17.jpg)
A new way of seeing the WoDConsider the WoD as network
![Page 18: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/18.jpg)
Relevant (Network) Properties of WoD
• Average path length
• Degree distribution
• Strongly connected components
• Degree centrality
• Between centrality
• Closeness centrality
![Page 19: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/19.jpg)
Scales of observation of the WoD 1. Graphs scale
![Page 20: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/20.jpg)
Graph-scale WoD network
• Each dataset is a node • Edges are weighted, directed connections
between the datasetso if there is at least one triple having a subject
within dataset 1 and an object within dataset 2, then there is an edge between these two datasets.
o the number of such triples is the weight of the edge.
![Page 21: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/21.jpg)
• 110 nodes with 350 edges• Average path length is 2.16• 50 components
![Page 22: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/22.jpg)
The degree of 7 is critical point after which the network is not scale-free any more.
![Page 23: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/23.jpg)
Top central nodes
Node Value
DBpedia 0.332
DBLP Berlin 0.108
DBLP (RKB) 0.100
DBLP Hannover 0.097
FOAF profiles 0.075
Betweenness centrality
Node Value
DBpedia 0.762
Geonames 0.614
Drug Bank 0.576
Linked MDB 0.544
Flickr wrappr 0.526
Closeness centrality
Node Value
DBpedia 0.505
UniProt 0.266
DBLP (RKB) 0.266
ACM (RKB) 0.229
GeneID 0.211
Degree centrality
Every centrality has a specific meaning...
![Page 24: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/24.jpg)
Scales of observation of the WoD
2. Triple scale
![Page 25: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/25.jpg)
Triple-scale WoD network
• We took the 10 million triples from the dataset crawled from the WoD, provided by the billion triple challenge 2009
• This "BTC" network is defined as G=(V, (E, L)), where
o V is a set of nodes, and each node is a URI or a literal
o E is a set of edgeso L is a set of labels, each label characterising a
relation between nodes
• We applied a few strategies to aggregate data for
comparison.
![Page 26: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/26.jpg)
Network Nodes EgesAverage path
lengthComponents
BTC 605K 860K 2.15 602K
BTC aggregated 14K 31K 2.80 7K
BTC aggregated + filter
37 91 1.88 17
Triple-scale network and its aggregations• BTC aggregated: triples are aggregated by the
domain names• BTC aggregated + filter: only domain names
shared with the graph-scale network
![Page 27: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/27.jpg)
Degree distribution
BTC BTC aggregated
Power-law distribution
![Page 28: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/28.jpg)
Monitoring and Improving the WoD
• Linked data is meant to be browsed, jumping from one resource to another
• The presence of Hubs is critical for the paths• Create alternate paths to be used in case of failure
Guéret, Groth, van Harmelen, Schlobach, "Finding the Achilles Heel of the Web of Data: using network analysis for link-recommendation”
![Page 29: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/29.jpg)
AmsterdamAmsterdam
The NetherlandsThe Netherlands
isLocatedIn
ChristopheChristophe VU AmsterdamVU AmsterdamworkIn
isLocatedIn
workIn
workIn
The links have explicit semantics, which brings implicit links deduced after the reasoning process
Challenges:
![Page 30: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/30.jpg)
Challenges:
• Multi-relations links
• FOAF (social networks + personal information)• SIOC (relations characterising blogs)• SWRC (describing research work)• …
Different filtering produce different networksCentrality status of nodes changes w.r.t the networks
• Dynamics
• Data will be continuously added and linked.
![Page 31: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/31.jpg)
FORMAL INTERACTIONS WITH THE WEB OF DATA
Part3
![Page 32: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/32.jpg)
32/18
Interacting with Linked Data
Common semantic paradigm
Common goals:
Completeness: all the answers
Soundness: only exact answers
![Page 33: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/33.jpg)
33/18
When solutions do not (quite) fit the problem ...
Copyright: sfllaw (Flickr, image 222795669)
![Page 34: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/34.jpg)
34/18
MotivationIn the context of Web data ?
Issues with scale
Issues with lack of consistency
Issues with contextualised views over the World
Revise the goals
As many answers as possible (or needed)
Answers as accurate as possible (or needed)
![Page 35: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/35.jpg)
35/18
From logic to optimisationOptimise towards the revised goals
Need methods that cope with uncertainty, context, noise, scale, ...
![Page 36: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/36.jpg)
Nature inspired methods for interacting with complex systems
• Advantageous properties– Adaptation– Simplicity– Interactivity: Anytime, user in the loop– Scalability and robustness– Good for dealing with dynamic information
• Studied for different interaction types
![Page 37: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/37.jpg)
37/18
Answering queries over the data
Copyright: jepoirrier (Flickr, image 829293711)
![Page 38: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/38.jpg)
38/18
The problemMatch a graph pattern to the data
Most common approach
Join partial results for each edge of the query
![Page 39: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/39.jpg)
39/18
Solving approachesLogic-based
Find all the answers matching all of the query pattern
Optimisation
Find answers matching as much of the query as possible
Important implications of the optimisation
Only some of the answers will be found
Some of the answers found will be partially true
![Page 40: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/40.jpg)
![Page 41: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/41.jpg)
Data Layer
SE1
Cache
?
?
SE2SE3
candidate solutions Offspring
1
ERDF: An evolutionary algorithm under the hood
2
33
4
Query ResultsWeb of Data
InputSet of property/value pairs
![Page 42: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/42.jpg)
Data Layer
SE1
Cache
?
?
SE2SE3
candidate solutions Offspring
1
ERDF: An evolutionary algorithm under the hood
2
33
4
Query ResultsWeb of Data
Initial Population
Randomly chosen to fit the query graph
![Page 43: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/43.jpg)
Data Layer
SE1
Cache
?
?
SE2SE3
candidate solutions Offspring
1
ERDF: An evolutionary algorithm under the hood
2
33
4
Query ResultsWeb of Data
Determining fitness by querying the Web of Data
Single assertions are sent to SPARQL endpoints
![Page 44: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/44.jpg)
Data Layer
SE1
Cache
?
?
SE2SE3
candidate solutions Offspring
1
ERDF: An evolutionary algorithm under the hood
2
33
4
Query ResultsWeb of Data
Selection
Fitness determines the best candidate which is chosen as parent of the next generation
Create offspring
Loop:
![Page 45: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/45.jpg)
Data Layer
SE1
Cache
?
?
SE2SE3
candidate solutions Offspring
1
ERDF: An evolutionary algorithm under the hood
2
33
4
Query ResultsWeb of Data
![Page 46: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/46.jpg)
Data Layer
SE1
Cache
?
?
SE2SE3
candidate solutions Offspring
1
ERDF: An evolutionary algorithm under the hood
2
33
4
Query ResultsWeb of Data
![Page 47: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/47.jpg)
Scalable
Lean
Robust
Anytime
Approximate
Properties of eRDF
Arbitrary SPARQL endpoints
Join-free, so scaling to more
endpoints is comparably
pain
free
![Page 48: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/48.jpg)
48/18
Some resultsTested on queries with varied complexity
Works best with more complex queries
Find exact answers when there are some
![Page 50: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/50.jpg)
50/18
The problemDeduce new facts from others
Most common approach
Centralise all the facts, batch process deductions
![Page 51: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/51.jpg)
51/18
Solving approachesLogic-based
Find all the facts that can be derived from the data
Optimisation
Find as many facts as possible while preserving consistency
Important implications of the optimisation
Only some of the facts will be found
Unstable content
![Page 52: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/52.jpg)
![Page 53: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/53.jpg)
53/18
An optimisation approach: SwarmsSwarm of micro-reasoners
Browse the graph, applying rules when possible
Deduced facts disappear after some time
Every author of apaper is a person
Every person is also an agent
![Page 54: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/54.jpg)
54/18
Some resultsIf they stay, most of the implicit facts are derived
Ants need to follow each other to deal with precedence of rules
Several ants per rule are needed
![Page 55: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/55.jpg)
Related findings and approaches
• Storage optimisation using swarms (SwarmLinda from FU Berlin)
• Join optimisation with swarms (RCQ-ACS Erasmus Rotterdam)
• Emergent Semantics (eXascale Infolab Fribourg)
• Previous speaker (argumentation based semantics)
![Page 56: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/56.jpg)
![Page 57: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/57.jpg)
The day Semantics died…. ?
AImWD -- Montpellier 2013Stefan Schlobach
(based on work of and using slides from Christophe Gueret, Kathrin Denthler and Wouter Beek)
VU Amsterdam
![Page 58: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/58.jpg)
PRAGMATIC SEMANTICS FOR THE WEB OF DATA
Part4
![Page 59: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/59.jpg)
There is meaning in the structure
![Page 60: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/60.jpg)
Requirements
• Standard languages• Standard semantics still valid (for simple data)• Integrate structural properties
– Popularity of nodes/triples– “Distance” between triples– Frequency of triples
Semantics not strict, but pragmaticIntuitively: a statement twenty times made is more true
than a statement once made
![Page 61: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/61.jpg)
Approach
• Entailment defined through optimality over different (possibly competing) notions of truth
• Make as much information in the data explicit, and turn it into first-class semantics citizens (truth orderings)
• Pragmatic entailment is defined through multi-objective optimisation.
• Interoperability is then achieved by enriching an ontology with meta-information about semantic orderings, as well as agreement on the weighting of orderings.
![Page 62: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/62.jpg)
Subset based truth orderings
– the size of the minimal entailing subontology – ratio of sub-models in which a formula is satisfied
versus the total number of sub-models– ratio between sub-ontologies of O in which a
formula holds holds versus the number of all sub-ontologies
Truth based on part of the given information
![Page 63: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/63.jpg)
Graph-based truth orderings
• A shortest path ordering (diameter of the induced sub-graphs). Such a notion is a proxy for confidence of derivation. A
• A random-walk distance or edge-weights, induce orderings that are clustering-aware, with sub-ontologies entailing a formula have more cohesion than others.
• PageRank orderings can be used as proxies for popularity
Truth given on the structure of given information
![Page 64: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/64.jpg)
Pragmatic Entailment
• A pragmatic closure C for an ontology O and orderings f1 to fn is then a set of formulas that is Pareto-optimal w.r.t. the optimisation problem max[f1 (C),…,fn (C)].
![Page 65: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/65.jpg)
PraSem
• Project title : Pragmatic Semantics for the Web of Data
• Acronym: PraSem• Runtime: Nov 2012-Oct 2016• Main researcher: Wouter Beek• People involved: Stefan Schlobach, Christophe
Gueret, Kathrin Denthler, Pepijn Kroes, Frank van Harmelen, and hopefully more people soon.
![Page 66: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/66.jpg)
Deal with Open World Assumption
April 12, 2023 IS: Web of Data 66
![Page 67: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/67.jpg)
Deal with incompleteness
April 12, 2023 IS: Web of Data 67
![Page 68: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/68.jpg)
Formalise approximations
April 12, 2023 IS: Web of Data 68
![Page 69: Keynote at AImWD](https://reader035.vdocuments.net/reader035/viewer/2022062405/55506695b4c90574428b55ee/html5/thumbnails/69.jpg)
Take home message
• The Web of Data requires semantics• The Web of Data is not a database• The Web of Data is a complex system• Semantics for a database are not (always)
suitable for complex systems• We need new semantic paradigms
– Voila: Pragmatic Semantics