linked data, ontologies and inference

41
Barry Norton, Solutions Architect Ontotext (UK), London SemWeb Meet-up, NYC, April 2013 Linked Data, Ontologies and Inference

Upload: barry-norton

Post on 08-May-2015

1.783 views

Category:

Technology


0 download

DESCRIPTION

Presented at the New York SemWeb Meetup, April 2013

TRANSCRIPT

Page 1: Linked Data, Ontologies and Inference

Barry Norton, Solutions Architect

Ontotext (UK), London

SemWeb Meet-up, NYC, April 2013

Linked Data,

Ontologies and Inference

Page 2: Linked Data, Ontologies and Inference

Linked Data

• Defined in a W3C Technical Note including these core principles:

1. Use URIs as names for things

2. Use HTTP URIs so that people can look up those 2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)

4. Include links to other URIs. so that they can discover more things.

2

Page 3: Linked Data, Ontologies and Inference

Linked Open Data

• The Linking Open Data (LOD) project of the W3C Semantic Web Outreach and Education

Task Force has

developed adeveloped a

good deal of

best practice

and exposed

a large number

of interlinked datasets3

Page 4: Linked Data, Ontologies and Inference

• Many datasets – variety of publishers• Re-using URIs enables Linked Data• Browse using URIs to datasets

Linked Data Vision

#4

Page 5: Linked Data, Ontologies and Inference

FactForge and LinkedLifeData

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

#5

Page 6: Linked Data, Ontologies and Inference

• FactForge (indicated in red on the next slide)

– Some of the central LOD datasets

– General-knowledge information

– 1.2B explicit plus .9B inferred indexed, 10B retrievable statements

– http://www.factforge.net/

FactForge: Contents

• Linked Life Data (indicated in yellow)

– 25 of the most popular life-science datasets

– Complemented by gluing ontologies

– 2.7B explicit and 1.4B inferred, total of 4.1B indexed statements

– http://www.linkedlifedata.com

#6

Page 7: Linked Data, Ontologies and Inference

• Datasets: DBPedia, Freebase, Geonames, UMBEL,

MusicBrainz, Wordnet, CIA World Factbook, Lingvoj

• Ontologies: Dublin Core, SKOS, RSS, FOAF

• Inference: materialization with respect to OWL2 RL

– owl:sameAs optimization in BigOWLIM allows reduction of the

indices without loss of semantics, but big gains in performance

FactForge

indices without loss of semantics, but big gains in performance

• Free public service at http://www.factforge.net,

– Incremental URI auto-suggest

– Query and explore through Forest and Tabulator

– RDF Search: retrieve ranked list of URIs by keywords

– SPARQL end-point

#7

Page 8: Linked Data, Ontologies and Inference

Dataset

Explicit

Indexed

Triples

('000)

Inferred

Indexed

Triples

('000)

Total # of

Stored

Triples

('000)

Entities

('000 of

nodes in

the graph)

Inferred

closure

ratio

Sechmata and ontologies 11 7 18 6 0.6

DBpedia (categories) 2,877 42,587 45,464 1,144 14.8

DBpedia (sameAs) 5,544 566 6,110 8,464 0.1

UMBEL 5,162 42,212 47,374 500 8.2

FactForge: Datasets

UMBEL 5,162 42,212 47,374 500 8.2

Lingvoj 20 863 883 18 43.8

CIA Factbook 76 4 80 25 0.1

Wordnet 2,281 9,296 11,577 830 4.1

Geonames 91,908 125,025 216,933 33,382 1.4

DBpedia core 560,096 198,043 758,139 127,931 0.4

Freebase 463,689 40,840 504,529 94,810 0.1

MusicBrainz 45,536 421,093 466,630 15,595 9.2

Total 1,177,961 881,224 2,058,185 283,253 0.7

#8

Page 9: Linked Data, Ontologies and Inference

Querying Linked Data

Presented by:

Barry Norton

Page 10: Linked Data, Ontologies and Inference

Motivation: Music!

Visualization

Module

Ap

pli

cati

on

Analysis &

Mining Module

LD D

ata

set

Acc

ess

Vocabulary

SPARQL

Endpoint

Publishing

RDFa

10Metadata

Streaming providers

Physical Wrapper

Downloads

Da

ta a

cqu

isit

ion

D2R Transf.LD Wrapper

Musical Content

LD D

ata

set

LD Wrapper

RDF/

XML

Integrated

DatasetInterlinking Cleansing

Vocabulary

Mapping

Other content

Page 11: Linked Data, Ontologies and Inference

• The data of interest may be stored in a wide range or

formats:

Extracting the Data

• Several tools support the process of mining data

from different repositories, for example:

11EUCLID - Providing Linked Data

Spreadsheets

or tabular data Databases Text

R2RML

Page 12: Linked Data, Ontologies and Inference

Reasoning for

Linked Data Integration• Example: Integration of the MusicBrainz data set and

the DBpedia data set

Integration

EUCLID - Querying Linked Data 12

Integration

Data set Data set

Page 13: Linked Data, Ontologies and Inference

Reasoning for

Linked Data Integration

mo:b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d

foaf:name The Beatles;

mo:member

mo:ba550d0e-adac-4864-b88b-407cab5e76af;

mo:member

mo:4d5447d7-c61c-4120-ba1b-d7f471d385b9;

mo:member

mo:42a8f507-8412-4611-854f-926571049fa0;

dbpedia:The_Beatles

dbpedia-ont:origin dbpedia:Liverpool;

dbpedia-ont:genre dbpedia:Rock_music;

foaf:depiction .

same

EUCLID - Querying Linked Data 13

mo:42a8f507-8412-4611-854f-926571049fa0;

mo:member

mo:300c4c73-33ac-4255-9d57-4e32627f5e13.

Integration

Data set Data set

Page 14: Linked Data, Ontologies and Inference

Reasoning for

Linked Data Integration

mo:b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d

foaf:name The Beatles;

mo:member

mo:ba550d0e-adac-4864-b88b-407cab5e76af;

mo:member

mo:4d5447d7-c61c-4120-ba1b-d7f471d385b9;

mo:member

mo:42a8f507-8412-4611-854f-926571049fa0;

dbpedia:The_Beatles

dbpedia-ont:origin dbpedia:Liverpool;

dbpedia-ont:genre dbpedia:Rock_music;

foaf:depiction .

same

EUCLID - Querying Linked Data 14

mo:42a8f507-8412-4611-854f-926571049fa0;

mo:member

mo:300c4c73-33ac-4255-9d57-4e32627f5e13.

SELECT ?m ?g WHERE {

dbpedia:The_Beatles

dbpedia-ont:genre ?g;

mo:member ?m.}

Query:?m ?g

mo:ba550d0e-adac-4864-b88b-

407cab5e76afdbpedia:Rock_music

mo:4d5447d7-c61c-4120-ba1b-

d7f471d385b9dbpedia:Rock_music

mo42a8f507-8412-4611-854f-

926571049fa0;dbpedia:Rock_music

mo300c4c73-33ac-4255-9d57-

4e32627f5e13dbpedia:Rock_music

Result set:

Page 15: Linked Data, Ontologies and Inference

SPARQL 1.1:

Entailment Regimes• SPARQL 1.0 was defined only for simple entailment

(pattern matching )

• SPARQL 1.1 is extended with entailment regimes other than simple entailment:

– RDF entailment

EUCLID - Querying Linked Data 15

– RDF entailment

– RDFS entailment

– D-Entailment

– OWL RL entailment

– OWL Full entailment

– OWL 2 DL, EL, and QL entailment

– RIF entailment

Source: http://www.w3.org/TR/rdf-mt/#RDFSRules

Page 16: Linked Data, Ontologies and Inference

RDFS

Resource Description Framework Schema

Taxonomies and inferences

EUCLID - Querying Linked Data 16

Semantic Web Stack

Berners-Lee (2006)

Taxonomies and inferences

Page 17: Linked Data, Ontologies and Inference

RDFS Entailment Regimes

• Contains 13 entailment rules denominated rdfsi for inference over RDFS definitions*:

– rdfs:Literal (rdfs1, rdfs13)

– rdfs:domain (rdfs2), rdfs:range (rdfs3)

– rdfs:Resource (rdfs4a, rdfs4, rdfs8)

EUCLID - Querying Linked Data 17

– rdfs:Resource (rdfs4a, rdfs4, rdfs8)

– rdfs:subPropertyOf (rdfs5, rdfs6, rdfs7, rdfs12)

– rdfs:Class (rdfs8, rdfs10)

– rdfs:subClassOf (rdfs9, rdfs10, rdfs11)

– rdfs:ContainerMembershipProperty (rdfs12)

– rdfs:Datatype (rdfs13)

* Source: http://www.w3.org/TR/rdf-mt/#RDFSRules

Page 18: Linked Data, Ontologies and Inference

rdfs2 – rdfs:domaindbpedia:

The_Beatles

dbpedia:

Paul_McCartney

mo:member

Schema: Query:

dbpedia:

John_Lennon

dbpedia:

George_Harrison

dbpedia:

Ringo_Starr

mo:member mo:member

mo:member

EUCLID - Querying Linked Data 18

SELECT ?x WHERE {

?x a mo:MusicGroup.}mo:member rdfs:domain

mo:MusicGroup .

?x ?x

dbpedia:The_Beatles …

Schema: Query:

Result set: Result set with inference:

Page 19: Linked Data, Ontologies and Inference

rdfs3 – rdfs:rangedbpedia:

The_Beatles

dbpedia:

Paul_McCartney

dbpedia-ont:

bandMember

Schema: Query:

dbpedia:

John_Lennon

dbpedia:

George_Harrison

dbpedia:

Ringo_Starr

dbpedia-ont:

bandMember

dbpedia-ont:

bandMember

dbpedia-ont:

bandMember

EUCLID - Querying Linked Data 19

SELECT ?x WHERE {

?x a foaf:Agent.}mo:member rdfs:range

foaf:Agent .

?x ?x

dbpedia:Paul_McCartney

dbpedia:John_Lennon

dbpedia:Ringo_Starr

dbpedia:George_Harrison …

Schema: Query:

Result set: Result set with inference:

Page 20: Linked Data, Ontologies and Inference

rdfs7 – rdfs:subPropertyOfdbpedia:

Yesterday

dbpedia:

Paul_McCartney

mo:singer

Schema: Query:

dbpedia:

John_Lennon

dbpedia:

George_Harrison

dbpedia:

Ringo_Starr

mo:performer mo:performermo:performer

mo:performer

EUCLID - Querying Linked Data 20

SELECT ?x WHERE {

dbpedia:Yesterday mo:performer ?x.}mo:singer rdfs:subPropertyOf

mo:performer .

?x

dbpedia:John_Lennon

dbpedia:Ringo_Starr

dbpedia:George_Harrison

?x

dbpedia:John_Lennon

dbpedia:Ringo_Starr

dbpedia:George_Harrison

dbpedia:Paul_McCartney

Schema: Query:

Result set: Result set with inference:

Page 21: Linked Data, Ontologies and Inference

rdfs9 – rdfs:subClassOfdbpedia:

The_Beatles

Schema: Query:

mo:

MusicArtist

rdf:type

mo:

MusicGroup

rdf:type

EUCLID - Querying Linked Data 21

SELECT ?x WHERE {

?x a mo:MusicArtist.}mo:MusicGroup rdfs:subClassOf

mo:MusicArtist .

?x ?x

dbpedia:The_Beatles …

Schema: Query:

Result set: Result set with inference:

Page 22: Linked Data, Ontologies and Inference

Inference from Schema

• Knowledge encoded in the schema leads to infer new

facts

mo:MusicGroup rdfs:subClassOf mo:MusicArtist .

mo:MusicGroup a rdfs:Class .

mo:MusicArtist a rdfs:Class .

Schema:

Inferred

facts:

EUCLID - Querying Linked Data 22

• This is also captured in the set of axiomatic triples, which provide basic meaning for all the vocabulary terms

mo:MusicArtist a rdfs:Class .facts:

rdfs:subClassOf rdfs:domain rdfs:Class .

rdfs:subClassOf rdfs:range rdfs:Class .

Page 23: Linked Data, Ontologies and Inference

RDFS:

Lack of Consistency Check• It is possible to infer facts that seem incorrect facts,

but RDFS cannot prevent this:

Schema: mo:member rdfs:domain mo:MusicGroup ;

rdfs:range foaf:Agent .

EUCLID - Querying Linked Data 23

Existing :PaulMcCartney a :SoloMusicArtist ;

facts: :member :TheBeatles .

Inferred :PaulMcCartney a :MusicGroup .

facts: No contradiction!:

The mis-modeling is

not diagnosed

rdfs2

Page 24: Linked Data, Ontologies and Inference

• We might wish further inferences, but these are

beyond the entailment rules implemented by RDFS

RDFS:

Inference Limitations

foaf:knows rdfs:domain foaf:Person ;

rdfs:range foaf:Person .

foaf:made rdfs:domain foaf:Agent .

:PaulMcCartney foaf:made :Yesterday ;

Schema:

Existing

EUCLID - Querying Linked Data 24

:PaulMcCartney foaf:made :Yesterday ;

foaf:knows :RingoStarr .

:PaulMcCartney a foaf:Agent ;

a foaf:Person .

:RingoStarr a foaf:Person .

Existing

fact:

Inferred

facts:

:Yesterday dc:creator :PaulMcCartney.

:RingoStarr foaf:knows :PaulMcCartney .

These inferences require OWL!

NOT

inferred:

Cannot model with

RDFS that ‘x knows y’

implies ‘y knows x’

Cannot model with

RDFS that if ‘x makes

y’ implies that ‘the

creator of y is x’

Page 25: Linked Data, Ontologies and Inference

OWL

Web Ontology Language

Ontologies and inferences

EUCLID - Querying Linked Data 25

Semantic Web Stack

Berners-Lee (2006)

Ontologies and inferences

Page 26: Linked Data, Ontologies and Inference

Introduction to OWL

• Provides more ontological constructs and avoids some of the potential confusion in RDFS

• OWL 2 is divided into sub-languages denominated profiles:

– OWL 2 EL: Limited to basic classification, but with polynomial-time reasoning

EUCLID - Querying Linked Data 26

but with polynomial-time reasoning

– OWL 2 QL: Designed to be translatable to relational database querying

– OWL 2 RL: Designed to be efficiently implementable in rule-based systems

• Most triple stores concentrate on the use of RDFS with a subset of OWL features, called OWL-Horst or RDFS++

More restrictive

than OWL DL

Page 27: Linked Data, Ontologies and Inference

OWL Properties

OWL distinguishes between two types of properties:

• OWL ObjectProperties: resources as values

• OWL DatatypeProperties: literals as values

:plays rdf:type owl:ObjectProperty;

EUCLID - Querying Linked Data 27

:plays rdf:type owl:ObjectProperty;

rdfs:domain :Musician;

rdfs:range :Instrument .

:hasMembers rdf:type owl:DatatypeProperty;

rdfs:domain :MusicGroup

rdfs:range xsd:int .

Page 28: Linked Data, Ontologies and Inference

Property Axioms

• Property axioms include those from RDF Schema

• OWL allows for property equivalence. Example:EquivalentObjectProperties(dbpedia-ont:bandMember mo:member)

dbpedia-ont:bandMember owl:equivalentProperty mo:member.≡

Query:

EUCLID - Querying Linked Data 28

dbpedia:

The_Beatles

dbpedia:

Paul_McCartney

mo:member

dbpedia:

John_Lennon

dbpedia:

George_Harrison

dbpedia:

Ringo_Starr

mo:member

mo:member

mo:member

SELECT ?x {dbpedia:The_Beatles

dbpedia-ont:bandMember ?x.}

Query:

?x

Result set:

?x

dbpedia:Paul_McCartney

dbpedia:John_Lennon

dbpedia:Ringo_Starr

dbpedia:George_Harrison

Result set with inference:

Page 29: Linked Data, Ontologies and Inference

Property Axioms

• Property axioms include those from RDF Schema

• OWL allows for property equivalence. Example:EquivalentObjectProperties(dbpedia-ont:bandMember mo:member)

dbpedia-ont:bandMember owl:equivalentProperty mo:member.≡

EUCLID - Querying Linked Data 29

• OWL allows for property disjointness. Example: DisjointObjectProperty(dbpedia-ont:length mo:duration)

dbpedia-ont:length owl:propertyDisjointWith mo:duration.

• There is no standard for implementing inconsistency

reports under SPARQL

Page 30: Linked Data, Ontologies and Inference

Property Axioms (2)

OWL allows the definition of property characteristics to infer new

facts relating to instances and their properties

• Symmetry

• Transitivity

EUCLID - Querying Linked Data 30

• Transitivity

• Inverse

• Functional

• Inverse Functional

Page 31: Linked Data, Ontologies and Inference

Property Axioms:

Symmetry

dbpedia:

The_Beatles

dbpedia:

Plastic_Ono_

Band :associatedMusicalArtist

a owl:SymmetricProperty .:associatedMusicalArtist

Schema:

SELECT ?x WHERE {

dbpedia:The_Beatles

Query:

:associatedMusicalArtist

EUCLID - Querying Linked Data 31

dbpedia:

Billy_Preston

?genre

dbpedia:Plastic_Ono_Band

?genre

dbpedia:Plastic_Ono_Band

dbpedia:Billy_Preston

Result set: Result set with inference:

dbpedia:The_Beatles

:associatedMusicalArtist ?x.}:associatedMusicalArtist

Page 32: Linked Data, Ontologies and Inference

Property Axioms:

Transitivity

:Rock

:Heavy_ :Heavy_

metal

:Punk_ :Punk_

rockSELECT ?genre WHERE {

:Rock :subgenre ?genre .}

:subgenre a owl:TransitiveProperty .:subgenre :subgenre

:subgenre :subgenre

Schema:

Query:

EUCLID - Querying Linked Data 32

:Black_ :Black_

metal

:Rock :subgenre ?genre .}

?genre

:Heavy_metal

:Punk_rock

?genre

:Heavy_metal

:Punk_rock

:Black_metal

Result set: Result set with inference:

Page 33: Linked Data, Ontologies and Inference

Property Axioms:

Inverse

SELECT ?x WHERE {

?x mo:member_of

mo:member_of owl:inverseOf mo:member.

Schema:

Query:dbpedia:

The_Beatles

mo:member_of

dbpedia:

John_Lennon

dbpedia:

George_Harrison

mo:member

mo:member_of

mo:member

mo:member_of mo:member_of

EUCLID - Querying Linked Data 33

?x mo:member_of

dbpedia:The_Beatles .}

?x

dbpedia:John_Lennon

dbpedia:George_Harrison

?x

dbpedia:John_Lennon

dbpedia:George_Harrison

dbpedia:Paul_McCartney

dbpedia:Ringo_Starr

Result set: Result set with inference:

dbpedia:

Paul_McCartney

dbpedia:

Ringo_Starr

mo:member_of mo:member_of

Page 34: Linked Data, Ontologies and Inference

Example: Every artist primarily plays

only one musical instrument

Property Axioms:

FunctionalIt refers to a property that can have only one (unique)

value for each instance

r2

sam

e

r1

mo:primary_instrument rdf:type owl:FunctionalProperty .

dbpedia:Jimi_Hendrix mo:primary_instrument dbpedia:Electric_Guitar.

dbpedia:Jimi_Hendrix mo:primary_instrument dbpedia:E-Guitar.

Conclusion dbpedia:Electric_Guitar

owl:sameAs dbpedia:E-Guitar .

EUCLID - Querying Linked Data 34

r2

sam

e

Page 35: Linked Data, Ontologies and Inference

Example: Every recording has a unique ISRC

(International Standard Recording Code)

Property Axioms:

Inverse FunctionalIt is useful for specifying unique properties identifying

an individual

r2sam

e

r1

mo:isrc rdf:type owl:InverseFunctionalProperty .

mo:21047249-7b3f-4651-acca-246669c081fd mo:isrc "GBAYE6300412" .

dbpedia:She_Loves_You mo:isrc "GBAYE6300412" .

Conclusion mo:21047249-7b3f-4651-acca-246669c081fd

owl:sameAs :dbpedia:She_Loves_You .

EUCLID - Querying Linked Data 35

r2sam

e

Page 36: Linked Data, Ontologies and Inference

Individual Axioms

OWL Individuals represent instances of classes. They are related to

their class by the rdf:type property

• We can state that two individuals are the sameSameIndividual(<artist/ba550d0e-adac-4864-b88b-407cab5e76af#_> dbpedia:PaulMcCartney)

<artist/ba550d0e-adac-4864-b88b-407cab5e76af#_> owl:sameAs dbpedia:PaulMcCartney .≡

EUCLID - Querying Linked Data 36

<artist/ba550d0e-adac-4864-b88b-407cab5e76af#_> owl:sameAs dbpedia:PaulMcCartney .

• We can state that two individuals are different

DifferentIndividuals(:TheBeatles_band :TheBeatles_TVseries)

:TheBeatles_band owl:differentFrom :TheBeatles_Tvseries .

Page 37: Linked Data, Ontologies and Inference

Class Axioms

Axioms declare general statements about concepts which are used

in logical inference (reasoning). Class axioms:

• Sub-class relationship (from RDF Schema)

• Equivalent relationship: classes have the same individuals

EquivalentClass(:Musician :MusicArtist)

EUCLID - Querying Linked Data 37

EquivalentClass(:Musician :MusicArtist)

:Musician owl:equivalentClass :MusicArtist .

• Disjointness: classes have no shared individuals

DisjointClasses(:SoloMusicArtist :MusicGroup)

:SoloMusicArtist owl:disjointWith :MusicGroup .≡

Page 38: Linked Data, Ontologies and Inference

Class Construction

• OWL classes are defined by the OWL term owl:Class

• OWL classes can be subclassed as in RDFS:

EUCLID - Querying Linked Data 38

• OWL classes may be combined with class constructs to

build new classes

Music Artist

Artist:MusicArtist rdfs:subClassOf :Artist .

Page 39: Linked Data, Ontologies and Inference

Class Construction (2)

These class constructs are available in OWL, not in RDFS

The class of female music artistsObjectIntersectionOf(:Female :MusicArtist) [a owl:Class;

owl:intersectionOf(:Female :MusicArtist)]

The class of music artists

Female

Music Artist

Solo

EUCLID - Querying Linked Data 39

The class of music artistsObjectUnionOf(:SoloMusicArtist :MusicGroup)

[a owl:Class;

owl:unionOf(:SoloMusicArtist :MusicGroup)]

Everything that’s not instrumental musicObjectComplementOf(:InstrumentalMusic)

[a owl:Class;

owl:complementOf(:InstrumentalMusic)]

Solo

Group

Instrumental

≡NOTE: Anonymous classes!

Page 40: Linked Data, Ontologies and Inference

Naming Class Constructions

• Direct naming can be achieved via owl:equivalentClass

Music Artist

Solo

Group

EquivalentClass(:MusicArtist

ObjectUnionOf(:SoloMusicArtist

:MusicGroup))

EUCLID - Querying Linked Data 40

• This construction provides necessary and sufficient conditions

for class membership

• Class naming can be also achieved using rdfs:subClassOf,

it provides a necessary but insufficient condition for class

membership

Group

:MusicArtist owl:equivalentClass

[owl:unionOf (:SoloMusicArtist :MusicGroup)]

Page 41: Linked Data, Ontologies and Inference

For exercises, quiz and further material visit our website:

http://www.euclid-project.eu

eBook Course

EUCLID - Providing Linked Data 41

@euclid_project EUCLID project EUCLIDproject

Other channels: