linked data, ontologies and inference
DESCRIPTION
Presented at the New York SemWeb Meetup, April 2013TRANSCRIPT
Barry Norton, Solutions Architect
Ontotext (UK), London
SemWeb Meet-up, NYC, April 2013
Linked Data,
Ontologies and Inference
Linked Data
• Defined in a W3C Technical Note including these core principles:
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those 2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more things.
2
Linked Open Data
• The Linking Open Data (LOD) project of the W3C Semantic Web Outreach and Education
Task Force has
developed adeveloped a
good deal of
best practice
and exposed
a large number
of interlinked datasets3
• Many datasets – variety of publishers• Re-using URIs enables Linked Data• Browse using URIs to datasets
Linked Data Vision
#4
FactForge and LinkedLifeData
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
#5
• FactForge (indicated in red on the next slide)
– Some of the central LOD datasets
– General-knowledge information
– 1.2B explicit plus .9B inferred indexed, 10B retrievable statements
– http://www.factforge.net/
FactForge: Contents
• Linked Life Data (indicated in yellow)
– 25 of the most popular life-science datasets
– Complemented by gluing ontologies
– 2.7B explicit and 1.4B inferred, total of 4.1B indexed statements
– http://www.linkedlifedata.com
#6
• Datasets: DBPedia, Freebase, Geonames, UMBEL,
MusicBrainz, Wordnet, CIA World Factbook, Lingvoj
• Ontologies: Dublin Core, SKOS, RSS, FOAF
• Inference: materialization with respect to OWL2 RL
– owl:sameAs optimization in BigOWLIM allows reduction of the
indices without loss of semantics, but big gains in performance
FactForge
indices without loss of semantics, but big gains in performance
• Free public service at http://www.factforge.net,
– Incremental URI auto-suggest
– Query and explore through Forest and Tabulator
– RDF Search: retrieve ranked list of URIs by keywords
– SPARQL end-point
#7
Dataset
Explicit
Indexed
Triples
('000)
Inferred
Indexed
Triples
('000)
Total # of
Stored
Triples
('000)
Entities
('000 of
nodes in
the graph)
Inferred
closure
ratio
Sechmata and ontologies 11 7 18 6 0.6
DBpedia (categories) 2,877 42,587 45,464 1,144 14.8
DBpedia (sameAs) 5,544 566 6,110 8,464 0.1
UMBEL 5,162 42,212 47,374 500 8.2
FactForge: Datasets
UMBEL 5,162 42,212 47,374 500 8.2
Lingvoj 20 863 883 18 43.8
CIA Factbook 76 4 80 25 0.1
Wordnet 2,281 9,296 11,577 830 4.1
Geonames 91,908 125,025 216,933 33,382 1.4
DBpedia core 560,096 198,043 758,139 127,931 0.4
Freebase 463,689 40,840 504,529 94,810 0.1
MusicBrainz 45,536 421,093 466,630 15,595 9.2
Total 1,177,961 881,224 2,058,185 283,253 0.7
#8
Querying Linked Data
Presented by:
Barry Norton
Motivation: Music!
Visualization
Module
Ap
pli
cati
on
Analysis &
Mining Module
LD D
ata
set
Acc
ess
Vocabulary
SPARQL
Endpoint
Publishing
RDFa
10Metadata
Streaming providers
Physical Wrapper
Downloads
Da
ta a
cqu
isit
ion
D2R Transf.LD Wrapper
Musical Content
LD D
ata
set
LD Wrapper
RDF/
XML
Integrated
DatasetInterlinking Cleansing
Vocabulary
Mapping
Other content
• The data of interest may be stored in a wide range or
formats:
Extracting the Data
• Several tools support the process of mining data
from different repositories, for example:
11EUCLID - Providing Linked Data
Spreadsheets
or tabular data Databases Text
R2RML
Reasoning for
Linked Data Integration• Example: Integration of the MusicBrainz data set and
the DBpedia data set
Integration
EUCLID - Querying Linked Data 12
Integration
Data set Data set
Reasoning for
Linked Data Integration
mo:b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d
foaf:name The Beatles;
mo:member
mo:ba550d0e-adac-4864-b88b-407cab5e76af;
mo:member
mo:4d5447d7-c61c-4120-ba1b-d7f471d385b9;
mo:member
mo:42a8f507-8412-4611-854f-926571049fa0;
dbpedia:The_Beatles
dbpedia-ont:origin dbpedia:Liverpool;
dbpedia-ont:genre dbpedia:Rock_music;
foaf:depiction .
same
EUCLID - Querying Linked Data 13
mo:42a8f507-8412-4611-854f-926571049fa0;
mo:member
mo:300c4c73-33ac-4255-9d57-4e32627f5e13.
Integration
Data set Data set
Reasoning for
Linked Data Integration
mo:b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d
foaf:name The Beatles;
mo:member
mo:ba550d0e-adac-4864-b88b-407cab5e76af;
mo:member
mo:4d5447d7-c61c-4120-ba1b-d7f471d385b9;
mo:member
mo:42a8f507-8412-4611-854f-926571049fa0;
dbpedia:The_Beatles
dbpedia-ont:origin dbpedia:Liverpool;
dbpedia-ont:genre dbpedia:Rock_music;
foaf:depiction .
same
EUCLID - Querying Linked Data 14
mo:42a8f507-8412-4611-854f-926571049fa0;
mo:member
mo:300c4c73-33ac-4255-9d57-4e32627f5e13.
SELECT ?m ?g WHERE {
dbpedia:The_Beatles
dbpedia-ont:genre ?g;
mo:member ?m.}
Query:?m ?g
mo:ba550d0e-adac-4864-b88b-
407cab5e76afdbpedia:Rock_music
mo:4d5447d7-c61c-4120-ba1b-
d7f471d385b9dbpedia:Rock_music
mo42a8f507-8412-4611-854f-
926571049fa0;dbpedia:Rock_music
mo300c4c73-33ac-4255-9d57-
4e32627f5e13dbpedia:Rock_music
Result set:
SPARQL 1.1:
Entailment Regimes• SPARQL 1.0 was defined only for simple entailment
(pattern matching )
• SPARQL 1.1 is extended with entailment regimes other than simple entailment:
– RDF entailment
EUCLID - Querying Linked Data 15
– RDF entailment
– RDFS entailment
– D-Entailment
– OWL RL entailment
– OWL Full entailment
– OWL 2 DL, EL, and QL entailment
– RIF entailment
Source: http://www.w3.org/TR/rdf-mt/#RDFSRules
RDFS
Resource Description Framework Schema
Taxonomies and inferences
EUCLID - Querying Linked Data 16
Semantic Web Stack
Berners-Lee (2006)
Taxonomies and inferences
RDFS Entailment Regimes
• Contains 13 entailment rules denominated rdfsi for inference over RDFS definitions*:
– rdfs:Literal (rdfs1, rdfs13)
– rdfs:domain (rdfs2), rdfs:range (rdfs3)
– rdfs:Resource (rdfs4a, rdfs4, rdfs8)
EUCLID - Querying Linked Data 17
– rdfs:Resource (rdfs4a, rdfs4, rdfs8)
– rdfs:subPropertyOf (rdfs5, rdfs6, rdfs7, rdfs12)
– rdfs:Class (rdfs8, rdfs10)
– rdfs:subClassOf (rdfs9, rdfs10, rdfs11)
– rdfs:ContainerMembershipProperty (rdfs12)
– rdfs:Datatype (rdfs13)
* Source: http://www.w3.org/TR/rdf-mt/#RDFSRules
rdfs2 – rdfs:domaindbpedia:
The_Beatles
dbpedia:
Paul_McCartney
mo:member
Schema: Query:
dbpedia:
John_Lennon
dbpedia:
George_Harrison
dbpedia:
Ringo_Starr
mo:member mo:member
mo:member
EUCLID - Querying Linked Data 18
SELECT ?x WHERE {
?x a mo:MusicGroup.}mo:member rdfs:domain
mo:MusicGroup .
?x ?x
dbpedia:The_Beatles …
Schema: Query:
Result set: Result set with inference:
rdfs3 – rdfs:rangedbpedia:
The_Beatles
dbpedia:
Paul_McCartney
dbpedia-ont:
bandMember
Schema: Query:
dbpedia:
John_Lennon
dbpedia:
George_Harrison
dbpedia:
Ringo_Starr
dbpedia-ont:
bandMember
dbpedia-ont:
bandMember
dbpedia-ont:
bandMember
EUCLID - Querying Linked Data 19
SELECT ?x WHERE {
?x a foaf:Agent.}mo:member rdfs:range
foaf:Agent .
?x ?x
dbpedia:Paul_McCartney
dbpedia:John_Lennon
dbpedia:Ringo_Starr
dbpedia:George_Harrison …
Schema: Query:
Result set: Result set with inference:
rdfs7 – rdfs:subPropertyOfdbpedia:
Yesterday
dbpedia:
Paul_McCartney
mo:singer
Schema: Query:
dbpedia:
John_Lennon
dbpedia:
George_Harrison
dbpedia:
Ringo_Starr
mo:performer mo:performermo:performer
mo:performer
EUCLID - Querying Linked Data 20
SELECT ?x WHERE {
dbpedia:Yesterday mo:performer ?x.}mo:singer rdfs:subPropertyOf
mo:performer .
?x
dbpedia:John_Lennon
dbpedia:Ringo_Starr
dbpedia:George_Harrison
?x
dbpedia:John_Lennon
dbpedia:Ringo_Starr
dbpedia:George_Harrison
dbpedia:Paul_McCartney
Schema: Query:
Result set: Result set with inference:
rdfs9 – rdfs:subClassOfdbpedia:
The_Beatles
Schema: Query:
mo:
MusicArtist
rdf:type
mo:
MusicGroup
rdf:type
EUCLID - Querying Linked Data 21
SELECT ?x WHERE {
?x a mo:MusicArtist.}mo:MusicGroup rdfs:subClassOf
mo:MusicArtist .
?x ?x
dbpedia:The_Beatles …
Schema: Query:
Result set: Result set with inference:
Inference from Schema
• Knowledge encoded in the schema leads to infer new
facts
mo:MusicGroup rdfs:subClassOf mo:MusicArtist .
mo:MusicGroup a rdfs:Class .
mo:MusicArtist a rdfs:Class .
Schema:
Inferred
facts:
EUCLID - Querying Linked Data 22
• This is also captured in the set of axiomatic triples, which provide basic meaning for all the vocabulary terms
mo:MusicArtist a rdfs:Class .facts:
rdfs:subClassOf rdfs:domain rdfs:Class .
rdfs:subClassOf rdfs:range rdfs:Class .
RDFS:
Lack of Consistency Check• It is possible to infer facts that seem incorrect facts,
but RDFS cannot prevent this:
Schema: mo:member rdfs:domain mo:MusicGroup ;
rdfs:range foaf:Agent .
EUCLID - Querying Linked Data 23
Existing :PaulMcCartney a :SoloMusicArtist ;
facts: :member :TheBeatles .
Inferred :PaulMcCartney a :MusicGroup .
facts: No contradiction!:
The mis-modeling is
not diagnosed
rdfs2
• We might wish further inferences, but these are
beyond the entailment rules implemented by RDFS
RDFS:
Inference Limitations
foaf:knows rdfs:domain foaf:Person ;
rdfs:range foaf:Person .
foaf:made rdfs:domain foaf:Agent .
:PaulMcCartney foaf:made :Yesterday ;
Schema:
Existing
EUCLID - Querying Linked Data 24
:PaulMcCartney foaf:made :Yesterday ;
foaf:knows :RingoStarr .
:PaulMcCartney a foaf:Agent ;
a foaf:Person .
:RingoStarr a foaf:Person .
Existing
fact:
Inferred
facts:
:Yesterday dc:creator :PaulMcCartney.
:RingoStarr foaf:knows :PaulMcCartney .
These inferences require OWL!
NOT
inferred:
Cannot model with
RDFS that ‘x knows y’
implies ‘y knows x’
Cannot model with
RDFS that if ‘x makes
y’ implies that ‘the
creator of y is x’
OWL
Web Ontology Language
Ontologies and inferences
EUCLID - Querying Linked Data 25
Semantic Web Stack
Berners-Lee (2006)
Ontologies and inferences
Introduction to OWL
• Provides more ontological constructs and avoids some of the potential confusion in RDFS
• OWL 2 is divided into sub-languages denominated profiles:
– OWL 2 EL: Limited to basic classification, but with polynomial-time reasoning
EUCLID - Querying Linked Data 26
but with polynomial-time reasoning
– OWL 2 QL: Designed to be translatable to relational database querying
– OWL 2 RL: Designed to be efficiently implementable in rule-based systems
• Most triple stores concentrate on the use of RDFS with a subset of OWL features, called OWL-Horst or RDFS++
More restrictive
than OWL DL
OWL Properties
OWL distinguishes between two types of properties:
• OWL ObjectProperties: resources as values
• OWL DatatypeProperties: literals as values
:plays rdf:type owl:ObjectProperty;
EUCLID - Querying Linked Data 27
:plays rdf:type owl:ObjectProperty;
rdfs:domain :Musician;
rdfs:range :Instrument .
:hasMembers rdf:type owl:DatatypeProperty;
rdfs:domain :MusicGroup
rdfs:range xsd:int .
Property Axioms
• Property axioms include those from RDF Schema
• OWL allows for property equivalence. Example:EquivalentObjectProperties(dbpedia-ont:bandMember mo:member)
dbpedia-ont:bandMember owl:equivalentProperty mo:member.≡
Query:
EUCLID - Querying Linked Data 28
dbpedia:
The_Beatles
dbpedia:
Paul_McCartney
mo:member
dbpedia:
John_Lennon
dbpedia:
George_Harrison
dbpedia:
Ringo_Starr
mo:member
mo:member
mo:member
SELECT ?x {dbpedia:The_Beatles
dbpedia-ont:bandMember ?x.}
Query:
?x
Result set:
?x
dbpedia:Paul_McCartney
dbpedia:John_Lennon
dbpedia:Ringo_Starr
dbpedia:George_Harrison
Result set with inference:
Property Axioms
• Property axioms include those from RDF Schema
• OWL allows for property equivalence. Example:EquivalentObjectProperties(dbpedia-ont:bandMember mo:member)
dbpedia-ont:bandMember owl:equivalentProperty mo:member.≡
EUCLID - Querying Linked Data 29
• OWL allows for property disjointness. Example: DisjointObjectProperty(dbpedia-ont:length mo:duration)
dbpedia-ont:length owl:propertyDisjointWith mo:duration.
• There is no standard for implementing inconsistency
reports under SPARQL
≡
Property Axioms (2)
OWL allows the definition of property characteristics to infer new
facts relating to instances and their properties
• Symmetry
• Transitivity
EUCLID - Querying Linked Data 30
• Transitivity
• Inverse
• Functional
• Inverse Functional
Property Axioms:
Symmetry
dbpedia:
The_Beatles
dbpedia:
Plastic_Ono_
Band :associatedMusicalArtist
a owl:SymmetricProperty .:associatedMusicalArtist
Schema:
SELECT ?x WHERE {
dbpedia:The_Beatles
Query:
:associatedMusicalArtist
EUCLID - Querying Linked Data 31
dbpedia:
Billy_Preston
?genre
dbpedia:Plastic_Ono_Band
?genre
dbpedia:Plastic_Ono_Band
dbpedia:Billy_Preston
Result set: Result set with inference:
dbpedia:The_Beatles
:associatedMusicalArtist ?x.}:associatedMusicalArtist
Property Axioms:
Transitivity
:Rock
:Heavy_ :Heavy_
metal
:Punk_ :Punk_
rockSELECT ?genre WHERE {
:Rock :subgenre ?genre .}
:subgenre a owl:TransitiveProperty .:subgenre :subgenre
:subgenre :subgenre
Schema:
Query:
EUCLID - Querying Linked Data 32
:Black_ :Black_
metal
:Rock :subgenre ?genre .}
?genre
:Heavy_metal
:Punk_rock
?genre
:Heavy_metal
:Punk_rock
:Black_metal
Result set: Result set with inference:
Property Axioms:
Inverse
SELECT ?x WHERE {
?x mo:member_of
mo:member_of owl:inverseOf mo:member.
Schema:
Query:dbpedia:
The_Beatles
mo:member_of
dbpedia:
John_Lennon
dbpedia:
George_Harrison
mo:member
mo:member_of
mo:member
mo:member_of mo:member_of
EUCLID - Querying Linked Data 33
?x mo:member_of
dbpedia:The_Beatles .}
?x
dbpedia:John_Lennon
dbpedia:George_Harrison
?x
dbpedia:John_Lennon
dbpedia:George_Harrison
dbpedia:Paul_McCartney
dbpedia:Ringo_Starr
Result set: Result set with inference:
dbpedia:
Paul_McCartney
dbpedia:
Ringo_Starr
mo:member_of mo:member_of
Example: Every artist primarily plays
only one musical instrument
Property Axioms:
FunctionalIt refers to a property that can have only one (unique)
value for each instance
r2
sam
e
r1
mo:primary_instrument rdf:type owl:FunctionalProperty .
dbpedia:Jimi_Hendrix mo:primary_instrument dbpedia:Electric_Guitar.
dbpedia:Jimi_Hendrix mo:primary_instrument dbpedia:E-Guitar.
Conclusion dbpedia:Electric_Guitar
owl:sameAs dbpedia:E-Guitar .
EUCLID - Querying Linked Data 34
r2
sam
e
Example: Every recording has a unique ISRC
(International Standard Recording Code)
Property Axioms:
Inverse FunctionalIt is useful for specifying unique properties identifying
an individual
r2sam
e
r1
mo:isrc rdf:type owl:InverseFunctionalProperty .
mo:21047249-7b3f-4651-acca-246669c081fd mo:isrc "GBAYE6300412" .
dbpedia:She_Loves_You mo:isrc "GBAYE6300412" .
Conclusion mo:21047249-7b3f-4651-acca-246669c081fd
owl:sameAs :dbpedia:She_Loves_You .
EUCLID - Querying Linked Data 35
r2sam
e
Individual Axioms
OWL Individuals represent instances of classes. They are related to
their class by the rdf:type property
• We can state that two individuals are the sameSameIndividual(<artist/ba550d0e-adac-4864-b88b-407cab5e76af#_> dbpedia:PaulMcCartney)
<artist/ba550d0e-adac-4864-b88b-407cab5e76af#_> owl:sameAs dbpedia:PaulMcCartney .≡
EUCLID - Querying Linked Data 36
<artist/ba550d0e-adac-4864-b88b-407cab5e76af#_> owl:sameAs dbpedia:PaulMcCartney .
• We can state that two individuals are different
DifferentIndividuals(:TheBeatles_band :TheBeatles_TVseries)
:TheBeatles_band owl:differentFrom :TheBeatles_Tvseries .
≡
≡
Class Axioms
Axioms declare general statements about concepts which are used
in logical inference (reasoning). Class axioms:
• Sub-class relationship (from RDF Schema)
• Equivalent relationship: classes have the same individuals
EquivalentClass(:Musician :MusicArtist)
EUCLID - Querying Linked Data 37
EquivalentClass(:Musician :MusicArtist)
:Musician owl:equivalentClass :MusicArtist .
• Disjointness: classes have no shared individuals
DisjointClasses(:SoloMusicArtist :MusicGroup)
:SoloMusicArtist owl:disjointWith :MusicGroup .≡
≡
Class Construction
• OWL classes are defined by the OWL term owl:Class
• OWL classes can be subclassed as in RDFS:
EUCLID - Querying Linked Data 38
• OWL classes may be combined with class constructs to
build new classes
Music Artist
Artist:MusicArtist rdfs:subClassOf :Artist .
Class Construction (2)
These class constructs are available in OWL, not in RDFS
The class of female music artistsObjectIntersectionOf(:Female :MusicArtist) [a owl:Class;
owl:intersectionOf(:Female :MusicArtist)]
The class of music artists
Female
Music Artist
Solo
≡
EUCLID - Querying Linked Data 39
The class of music artistsObjectUnionOf(:SoloMusicArtist :MusicGroup)
[a owl:Class;
owl:unionOf(:SoloMusicArtist :MusicGroup)]
Everything that’s not instrumental musicObjectComplementOf(:InstrumentalMusic)
[a owl:Class;
owl:complementOf(:InstrumentalMusic)]
Solo
Group
Instrumental
≡
≡NOTE: Anonymous classes!
Naming Class Constructions
• Direct naming can be achieved via owl:equivalentClass
Music Artist
Solo
Group
EquivalentClass(:MusicArtist
ObjectUnionOf(:SoloMusicArtist
:MusicGroup))
≡
EUCLID - Querying Linked Data 40
• This construction provides necessary and sufficient conditions
for class membership
• Class naming can be also achieved using rdfs:subClassOf,
it provides a necessary but insufficient condition for class
membership
Group
:MusicArtist owl:equivalentClass
[owl:unionOf (:SoloMusicArtist :MusicGroup)]
For exercises, quiz and further material visit our website:
http://www.euclid-project.eu
eBook Course
EUCLID - Providing Linked Data 41
@euclid_project EUCLID project EUCLIDproject
Other channels: