capturing emerging relations between schema ontologies on the web of data

24
Capturing emerging relations between schema ontologies on the Web of Data Andriy Nikolov Enrico Motta

Upload: andriynikolov

Post on 11-May-2015

361 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Capturing emerging relations between schema ontologies on the Web of Data

Capturing emerging relations between schema ontologies on the Web of Data

Andriy NikolovEnrico Motta

Page 2: Capturing emerging relations between schema ontologies on the Web of Data

Public linked data

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

• “Linking” in the Linked Data cloud:– References to instance URIs described in external sources– Special case: identity links between equivalent resources

Page 3: Capturing emerging relations between schema ontologies on the Web of Data

Motivation

• Schema heterogeneity is an obstacle both for creating and for utilising these links– Extracting information on the same topic from different

repositories– Discovering equivalence links between individuals

• Motivation for our work: discovering instance-level links– How to choose the repositories to connect a new one?– Which subsets of repositories contain co-referring instances?

TV programs

movies

pieces of music

LinkedMDB

DBPedia

Freebase

MusicBrainz

?

Page 4: Capturing emerging relations between schema ontologies on the Web of Data

Schema-level interlinks

Data-level

Schema-level?

Page 5: Capturing emerging relations between schema ontologies on the Web of Data

Matching approaches

• “Top-down”– Analyzing schema ontologies and

generating alignments (manually or automatically)

– UMBEL• Using CYC as a “backbone”• Mapping commonly used schema ontologies

• “Bottom-up”– Inferring schema mappings based on

instance-level information

Page 6: Capturing emerging relations between schema ontologies on the Web of Data

Our approach

• Constructing a large-scale network of schema mappings– Applying a light-weight instance-

based matcher• Analysing the resulting network

– What does it tell us about the use of ontologies?

Page 7: Capturing emerging relations between schema ontologies on the Web of Data

Motivating factors

• Potential use case scenarios– Discovering relevant sources for connection– Discovering relevant subsets of comparable

instances• Tolerance to the quality of mappings

– A mapping between “strongly overlapping” classes is still useful even if there is no strict equivalence/subsumption

Page 8: Capturing emerging relations between schema ontologies on the Web of Data

Instance-based matching

• Use of instance-based matching– Some implicit schema-level assumptions cannot be

captured using only schema-level evidence• Interpretation mismatches

– dbpedia:Actor = professional actor (film or stage)– movie:actor = anybody who participated in a movie

• Class interpretation “as used” vs “as designed”– FOAF: foaf:Person = any person– DBLP: foaf:Person = computer scientistRepository Richard Nixon David Garrick

dbpedia:Actor DBPedia - +

movie:Actor LinkedMDB + -

Page 9: Capturing emerging relations between schema ontologies on the Web of Data

Instance set overlaps

LinkedMDB DBPediaMusicBrainz

music:artist/a16…9fdf

==

dbpedia:Ennio_Morriconemovie:music_contributor/2490

movie:music_contributor dbpedia:Artist

is_a is_amo:MusicArtist

DBPedia

dbpedia:Ennio_Morricone

dbpedia:Artist

is_a

yago:ItalianComposers

is_a

• Co-typing

• Declared association

Page 10: Capturing emerging relations between schema ontologies on the Web of Data

Dataset

• Billion Triple Challenge 2009– about 1.14 billion triples– contains

• core LOD repositories (DBPedia, Freebase, Geonames, Musicbrainz, LinkedMDB,…)

• smaller semantic datasets retrieved by search servers (Falcon-S, Sindice)

– ≈3.6M co-typing-based overlapping pairs of classes

– ≈1M association-based pairs

Page 11: Capturing emerging relations between schema ontologies on the Web of Data

Inferring mappings• Classification task

– Classes A, B: is there a mapping?– Boolean classification

• type of mappings assigned based on comparing sizes of instance sets

• Features– , : namespaces of class URIs– : size of the overlap– , : sizes of instance sets– : ratio of the overlapping subset to the complete

instance set- direct/indirect: whether classes have instances

explicitly declared to be equivalent

Page 12: Capturing emerging relations between schema ontologies on the Web of Data

Test

Mapping set Algorithm Precision Recall F1

Association-based J48 0.939 0.689 0.795

Co-typing-based J48 0.952 0.944 0.948

• Training– Training set: 6000 overlapping pairs of classes– Test: 10-fold cross-validation

• Training– Training set: 6000 overlapping pairs of classes– Test: 10-fold cross-validation

• Applying– 2 networks of class mappings

Property Association-based Co-typing-based

Nodes 20365 35578

Edges 82422 67620

Max. connections/node 5301 18137

Node with max. connections geonames:Feature foaf:Person

Avg. connections/node 8.09 3.80

Distribution law power power

Page 13: Capturing emerging relations between schema ontologies on the Web of Data

Observations: class mappings

• Association-based network: classes involved into the largest number of mappings– High-level classes represented concepts covered in many repositories– … and describing categories with very fine-grained class

decomposition– Usually also the most populated ones

1 10 100

1000

10000

100000

1000000

10000000

100000000

1

10

100

1000

10000

Instance set size

Num

ber o

f map

ping

s per

clas

s

geonames:Featurefreebase:people.personyago:PhysicalEntitylinkedmdb:filmumbel:Person

akt:Personakt:ArticleReference…“under-linked” ones?

Page 14: Capturing emerging relations between schema ontologies on the Web of Data

1 10 100

1000

10000

100000

1000000

10000000

100000000

1

10

100

1000

10000

100000

Instance set size

Num

ber o

f map

ping

s per

clas

sObservations: class mappings

• Co-typing-based network: classes involved into the largest number of mappings– Popular classes reused in many repositories– … or in DBPedia– … and describing categories with fine-grained class decomposition– Usually also the most populated ones

foaf:Personumbel:Persondbpedia:Persondbpedia:FootballPlayerwordnet:Persondbpedia:Album

sioc:WikiArticlegeonames:Feature…

Page 15: Capturing emerging relations between schema ontologies on the Web of Data

Links between ontologies

Property Association-based Co-typing-based

Nodes 52 743

Edges 172 1352

Max. connections/node 29 504

Node with max. connections YAGO FOAF

Avg. connections/node 3.96 1.85

Connected components 5 35

• Aggregated network: connections between ontologies– Mapping-based links between ontologies– At least 1 mapping between corresponding classes must exist

Page 16: Capturing emerging relations between schema ontologies on the Web of Data

Association-based network

Page 17: Capturing emerging relations between schema ontologies on the Web of Data

Association-based network

Generic:- YAGO- Freebase- UMBEL- OpenCYC- DBPedia

Page 18: Capturing emerging relations between schema ontologies on the Web of Data

Association-based network

Domain-specific

Generic:- YAGO- Freebase- UMBEL- OpenCYC- DBPedia

Page 19: Capturing emerging relations between schema ontologies on the Web of Data

• Main factor: topic coverage• Popularity for linking is not

reflected– Data-level: DBPedia has more

connections than Freebase– Schema-level: no substantial

difference– Effect of exploiting composed links

Association-based network

Page 20: Capturing emerging relations between schema ontologies on the Web of Data

Co-typing-based network

• Main factor:– Popularity for

reuse

• FOAF and WordNet:– the most

popular

• DBPedia, YAGO, OpenCYC, UMBEL– Reused for

DBPedia instances

Page 21: Capturing emerging relations between schema ontologies on the Web of Data

Outcomes

• Possible usage scenarios for mappings– Selecting suitable sources to connect

• “LinkedMDB contains more movies than DBPedia – more likely to cover all my instances”

– Selecting an ontology to reuse to structure new instances• Which sources use this ontology? Do I want my data to be

integrated with them?– Other data-driven tasks

• E.g., exploratory search

• Generic challenges– How to take into account task requirements in

ontology matching?• Recall vs precision, fuzzy vs exact

– How to capture changes in the data?• BTC 2009 is almost obsolete by now

Page 22: Capturing emerging relations between schema ontologies on the Web of Data

Limitations and future work

• Limitations– Light-weight matcher can lead to lower

quality mappings• OK for our scenario but not others

– Pre-existing instance-level mappings are not always available

• Future work– Combining with schema-based ontology

matching techniques– Taking into account properties and complex

correspondences

Page 23: Capturing emerging relations between schema ontologies on the Web of Data

Questions?

Thanks for your attention

Page 24: Capturing emerging relations between schema ontologies on the Web of Data

Disjoint but overlapping

• Spurious owl:sameAs link– dbpedia:Hippocrates (Hippocrates) =

bookmashup:9004095748 (Hippocratic Lives and Legends (Studies in Ancient Medicine, Vol 4))

• Spurious rdf:type assignment– dbpedia:Celtic_Frost (band) defined as

Person in DBPedia (fixed in the current version of DBPedia)

• Modelling assumptions– dbpedia:Masada describes both the

geographical place and the battle