graph-based ontology analysis in the linked open data

30
Graph-based Ontology Analysis in the Linked Open Data Lihua Zhao, Ryutaro Ichise September 5, 2012, I-Semantics2012, Graz, Austria

Upload: lihua-zhao

Post on 11-May-2015

1.551 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Graph-based Ontology Analysis in the Linked Open Data

Graph-based Ontology Analysis in the Linked Open Data

Lihua Zhao, Ryutaro Ichise

September 5, 2012, I-Semantics2012, Graz, Austria

Page 2: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Outline

Introduction

Related WorkOur Approach

Graph Pattern Extraction<Predicate, Object> CollectionRelated Classes and Predciates GroupingIntegration for All Graph PatternsManual Revision

ExperimentsExperimental DataGraph Patterns of Linked InstancesClass-level AnalysisPredicate-level Analysis

Comparison with Previous Work

Conclusion and Future Work

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 2

Page 3: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Introduction

Linked Open Data (LOD)295 data sets, 31 billion RDF triples (as of Sep. 2011).Interlinked instances (owl:sameAs).

WorldFact-book

JohnPeel

(DBTune)

Pokedex

Pfam

US SEC(rdfabout)

LinkedLCCN

Europeana

EEA

IEEE

ChEMBL

SemanticXBRL

SWDogFood

CORDIS(FUB)

AGROVOC

OpenlyLocal

Discogs(Data

Incubator)

DBpedia

yovisto

Tele-graphis

tags2condelicious

NSF

MediCare

BrazilianPoli-

ticians

dotAC

ERA

OpenCyc

Italianpublic

schools

UB Mann-heim

JISC

MoseleyFolk

SemanticTweet

OS

GTAA

totl.net

OAI

Portu-guese

DBpedia

LOCAH

KEGGGlycan

CORDIS(RKB

Explorer)

UMBEL

Affy-metrix

riese

business.data.gov.

uk

OpenData

Thesau-rus

GeoLinkedData

UK Post-codes

SmartLink

ECCO-TCP

UniProt(Bio2RDF)

SSWThesau-

rus

RDFohloh

Freebase

LondonGazette

OpenCorpo-rates

Airports

GEMET

P20

TCMGeneDIT

Source CodeEcosystemLinked Data

OMIM

HellenicFBD

DataGov.ie

MusicBrainz

(DBTune)

data.gov.ukintervals

LODE

Climbing

SIDER

ProjectGuten-berg

MusicBrainz

(zitgist)

ProDom

HGNC

SMCJournals

Reactome

NationalRadio-activity

JP

legislationdata.gov.uk

AEMET

ProductTypes

Ontology

LinkedUser

Feedback

Revyu

GeneOntology

NHS(En-

AKTing)

URIBurner

DBTropes

Eurécom

ISTATImmi-

gration

LichfieldSpen-ding

SurgeRadio

Euro-stat

(FUB)

PiedmontAccomo-dations

NewYork

Times

Klapp-stuhl-club

EUNIS

Bricklink

reegle

CO2Emission

(En-AKTing)

AudioScrobbler(DBTune)

GovTrack

GovWILDECS

South-amptonEPrints

KEGGReaction

LinkedEDGAR

(OntologyCentral)

LIBRIS

OpenLibrary

KEGGDrug

research.data.gov.

uk

VIVOCornell

UniRef

WordNet(RKB

Explorer)

Cornetto

medu-cator

DDC DeutscheBio-

graphie

Wiki

Ulm

NASA(Data Incu-

bator)

BBCMusic

DrugBank

Turismode

Zaragoza

PlymouthReading

Lists

education.data.gov.

uk

KISTI

UniPathway

Eurostat(OntologyCentral)

OGOLOD

Twarql

MusicBrainz(Data

Incubator)

GeoNames

PubChem

ItalianMuseums

Good-win

Familyflickr

wrappr

Eurostat

Thesau-rus W

OpenLibrary(Talis)

LOIUS

LinkedGeoData

LinkedOpenColors

WordNet(VUA)

patents.data.gov.

uk

GreekDBpedia

SussexReading

Lists

MetofficeWeatherForecasts

GND

LinkedCT

SISVU

transport.data.gov.

uk

Didac-talia

dbpedialite

BNB

OntosNewsPortal

LAAS

ProductDB

iServe

Recht-spraak.

nl

KEGGCom-pound

GeoSpecies

VIVO UF

LinkedSensor Data(Kno.e.sis)

lobidOrgani-sations

LEM

LinkedCrunch-

base

FTS

OceanDrillingCodices

JanusAMP

ntnusc

WeatherStations

Amster-dam

Museum

lingvoj

Crime(En-

AKTing)

Course-ware

PubMed

ACM

BBCWildlifeFinder

Calames

Chronic-ling

America

data-open-

ac-uk

OpenElection

DataProject

Slide-share2RDF

FinnishMunici-palities

OpenEI

MARCCodes

List

VIVOIndiana

HellenicPD

LCSH

FanHubz

bibleontology

IdRefSudoc

KEGGEnzyme

NTUResource

Lists

PRO-SITE

LinkedOpen

Numbers

Energy(En-

AKTing)

Roma

OpenCalais

databnf.fr

lobidResources

IRIT

theses.fr

LOV

Rådatanå!

DailyMed

Taxo-nomy

New-castle

GoogleArt

wrapper

Poké-pédia

EURES

BibBase

RESEX

STITCH

PDB

EARTh

IBM

Last.FMartists

(DBTune)

YAGO

ECS(RKB

Explorer)

EventMedia

STW

myExperi-ment

BBCProgram-

mes

NDLsubjects

TaxonConcept

Pisa

KEGGPathway

UniParc

Jamendo(DBtune)

Popula-tion (En-AKTing)

Geo-WordNet

RAMEAUSH

UniSTS

Mortality(En-

AKTing)

AlpineSki

Austria

DBLP(RKB

Explorer)

Chem2Bio2RDF

MGI

DBLP(L3S)

Yahoo!Geo

Planet

GeneID

RDF BookMashup

El ViajeroTourism

Uberblic

SwedishOpen

CulturalHeritage

GESIS

datadcs

Last.FM(rdfize)

Ren.EnergyGenera-

tors

Sears

RAE2001

NSZLCatalog

Homolo-Gene

Ord-nanceSurvey

TWC LOGD

Disea-some

EUTCProduc-

tions

PSH

WordNet(W3C)

semanticweb.org

ScotlandGeo-

graphy

Magna-tune

Norwe-gian

MeSH

SGD

TrafficScotland

statistics.data.gov.

uk

CrimeReports

UK

UniProt

US Census(rdfabout)

Man-chesterReading

Lists

EU Insti-tutions

PBAC

VIAF

UN/LOCODE

Lexvo

LinkedMDB

ESDstan-dards

reference.data.gov.

uk

t4gminfo

Sudoc

ECSSouth-ampton

ePrints

Classical(DB

Tune)

DBLP(FU

Berlin)

Scholaro-meter

St.AndrewsResource

Lists

NVD

Fishesof

TexasScotlandPupils &Exams

RISKS

gnoss

DEPLOY

InterPro

Lotico

OxPoints

Enipedia

ndlna

Budapest

CiteSeer

Media

Geographic

Publications

User-generated content

Government

Cross-domain

Life sciences

As of September 2011

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 3

Page 4: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Challenging Problems

Infeasible to understand all the ontology schema of linked data sets.

Ontology heterogeneity problemHeterogeneous ontology classes

DBpedia: http://dbpedia.org/ontology/Country.Geonames: http://www.geonames.org/ontology#A.PCLI.LinkedMDB: http://data.linkedmdb.org/resource/movie/country.

Heterogeneous ontology predicates

http://dbpedia.org/property/populationTotal.http://dbpedia.org/property/population.

Time-consuming and infeasible to inspect large ontologiesMisuse of classes and predicatesDBpedia: 320 classes and thousands of predicates.

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 4

Page 5: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Solution for the Problems

Automatically or semi-automatically integrate different ontologiesby analyzing interlinked instances.

Semi-automatic ontology integrationReduce the ontology heterogeneity.Identify important ontology classes and predicates that link instances.Easy to understand simple integrated ontology.Simplify the queries on various data sets.

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 5

Page 6: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Related Work

Find useful attributes from frequent graph patterns. [Le, et al.,2010]

Only for geographic data.

Analysis of basic predicates of SameAs network, Pay-Level-Domainnetwork and Class-Level Similarity network. [Ding, et al., 2010]

Only frequent types are considered to analyze how data are connected.

A debugging method for mapping lightweight ontologies. [Meilicke,et al., 2008]

Limited to the expressive lightweight ontologies.

Construct intermediate-layer ontology from geospatial, zoology, andgenetics data resources. [Parundekar, et al., 2010]

Only for specific domains and only considers at class-level.

Construct an integrated mid-ontology from DBpedia, Geonames,and NYTimes. [Zhao, et al., 2011]

Needs a hub data set and only considers at predicate-level.

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 6

Page 7: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Our Approach

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 7

Page 8: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Step 1: Graph Pattern Extraction

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 8

Page 9: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Graph Pattern Extraction

Extract graph patterns from interlinked instances to discoverrelated ontology classes and predicates.

SameAs Graph SG = (V, E, I), V is a set of labels of data sets, E⊆ V × V, I is a set of URIs of the interlinked instances.

Example: SGAustria = (V, E, I)V = {D, G, N, M}E = {(D,G), (D,N), (G,N), (G,M)}I = { db:Austria, geo:2782113, nyt:66221058161318373601,mdb-country:AT}.

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 9

Page 10: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Step 2: <Predicate, Object> Collection

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 10

Page 11: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

<Predicate, Object> Collection

An instance has a collection of <subject, predicate, object>.(instance URI → subject, property → predicate, class → object)

<predicate, object> (PO) pairs as the content of a SameAs Graph.

Classify PO pairs into five types

Class: rdf:type and skos:inScheme.Date: XMLSchema:date, gYear, gMonthDay, etc.Number: XMLSchema:integer, int, float, double, etc.URI: starts with “http://” and XMLSchema:anyURI.String: XMLSchema:string and Others.

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 11

Page 12: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

An Example of Collected PO pairs

Table: PO pairs and types for SGAustria

Predicate Object Type

rdf:type owl:Thing Classrdf:type db-onto:Place Classrdf:type db-onto:PopulatedPlace Classrdf:type db-onto:Country Classrdfs:label “Austria”@en Stringdb-onto:wikiPageExternalLink http://www.austria.mu/ URIdb-prop:populationEstimate 8356707 Number. . . . . . . . . . . . . . . . . .geo-onto:name Austria Stringgeo-onto:alternateName “Austria”@en Stringgeo-onto:alternateName “Republic of Austria”@en Stringgeo-onto:featureClass geo-onto:A Classgeo-onto:featureCode geo-onto:A.PCLI Classgeo-onto:population 8205000 Number. . . . . . . . . . . . . . . . . .rdf:type mdb:country Classmdb:country name Austria String. . . . . . . . . . . . . . . . . .skos:inScheme nyt:nytd geo Classskos:prefLabel “Austria”@en Stringnyt-prop:first use 2004-10-04 Date

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 12

Page 13: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Step 3: Related Classes and Predicates Grouping

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 13

Page 14: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Related Classes Grouping

Group related classes from each SameAs Graph by trackingsubsumption relations owl:subClassOf and skos:inScheme.

< C1 owl:subClassOf C2 > or < C1 skos:inScheme C2 > means theconcept of class C1 is more specific than the concept of class C2.

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 14

Page 15: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Related Predicates Grouping

Perform pairwise comparison on <predicate, object> (PO) pairs tofind out related predicates (properties).

Discover related predicates using different methods for thetypes of Date, URI, Number, and String.

Date, URI: exact matching.Number, String: exact matching + similarity matching.

Exact matching on PO pairs to create initial sets of PO pairs.

If OPOi= OPOj

or PPOi= PPOj

⇒ Sk ← POi ,POj

OPO: the object of PO.

PPO: the predicate of PO.

S : Initial set of PO pairs.

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 15

Page 16: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Related Predicates Grouping

Similarity matching on PO pairs of type Number and String.

Similarity between POi and POj .

Sim(POi ,POj) =ObjSim(POi ,POj) + PreSim(POi ,POj)

2

Merge similar initial sets Si and Sj .

if Sim(POi ,POj) ≥ θ, where POi ∈ Si , POj ∈ Sj

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 16

Page 17: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Related Predicates Grouping

Similarity of objects between two PO pairs.

ObjSim(POi ,POj) =

{1−

|OPOi−OPOj

|OPOi

+OPOjif OPO is Number

StrSim(OPOi,OPOj

) if OPO is String

OPO: the object of PO.StrSim(OPOi ,OPOj ): the average of the three string-based similarityvalues JaroWinkler, Levenshtein distance, and n-gram.

Similarity of predicates between POi and POj

PreSim(POi ,POj) = WNSim(TPOi,TPOj

)

TPO: the pre-processed terms of the predicates in PO.WNSim(TPOi ,TPOj ): the average of the nine applied WordNet-basedsimilarity values.

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 17

Page 18: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Step 4: Integration for All Graph Patterns

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 18

Page 19: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Integration for All Graph Patterns

Groups of related classes and predicates are independent for eachgraph pattern. Hence, we integrate them for all the graph patternsto construct an integrated ontology.

Select terms for integrated ontology.ex-onto:ClassTerm: select one concept from a set of classes.ex-prop:propTerm: select one concept from a set of predicates.

Construct relations.ex-prop:hasMemberClasses: link sets of classes withex-onto:ClassTerm.ex-prop:hasMemberDataTypes: link sets of predicates withex-prop:propTerm.

Construct an integrated ontology.Sets of related classes and predicates.Selected terms: ClassTerm and propTerm.Constructed relations.

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 19

Page 20: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Step 5: Manual Revision

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 20

Page 21: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Manual Revision

Minor revision process on the automatically constructed ontology.

Modify incorrect termsNot all the terms of classes and predicates are properly selected.

Add domain informationAbout 40% of the predicate sets lack of rdfs:domain information.

Modify incorrectly grouped classes and predicatesWe can not guarantee 100% accuracy.

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 21

Page 22: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Experiments

Analyze the characteristics of linked instances with the integratedontology constructed with our approach.

Experimental Data

Graph Patterns of Linked Instances

Class-level Analysis

Predicate-level Analysis

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 22

Page 23: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Experimental Data

DBpedia: cross-domain, 3.5 million things, 8.9 million URIs.

Geonames: geographical domain, 7 million URIs.

NYTimes: media domain, 10,467 subject news.

LinkedMDB: media domain, 0.5 million entities.

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 23

Page 24: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Graph Patterns of Linked Instances

13 graph patterns

Frequent graph patterns:

GP1, GP2, GP3

N,G,D: GP4, GP5, GP7, GP8

N,M,D: GP6

M,G,D: GP9

M,D,N,G: GP10, GP11,

GP12, GP13

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 24

Page 25: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Class-level Analysis

Successfully integrated related classes from extracted graph patters.

Characteristics of graph patterns

Class Type Graph Pattern

Actor GP2, GP6

Person(Athlete, Politician, etc) GP3

Organization/Agent GP1, GP3, GP8

Film GP2

City/Settlement GP1, GP4, GP5, GP7, GP8

Country GP9, GP10, GP11, GP12, GP13

Place(Mountain, River, etc) GP1, GP3, GP7

Integrated 97 classes into 48 groupsExample: ex-onto:Countrydb-onto:Country geo-onto:A.PCLImdb:country nyt:nytd geo

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 25

Page 26: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Class-level Analysis

Discover missing class informationExample: db:Shingo Katori

db:Shingo Katori rdf:type dbpedia-owl:MusicalArtist.mdb-actor:27092 owl:sameAs db:Shingo Katori

Therefore, db:Shingo Katori rdf:type db-onto:Actor.

Main classes of each data set.

NYTimes: person, organization, and place.LinkedMDB: movie, actor, and country.Geonames: A(country, administrative region), P (city, settlement), T(mountain), S (building, school), and H (Lake, river).DBpedia: person (artist, politician, athlete), organization (company,educational institute, sports team), work (film), and place (populatedplace, natural place, architectural structure).

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 26

Page 27: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Predicate-level Analysis

Integrated 367 predicates into 38 groupsExample: ex-prop:birthDate

Predicate Number of Instances

db-onto:birthDate 287,327db-prop:datebirth 1,675db-prop:dateofbirth 87,364db-prop:dateOfBirth 163,876db-prop:born 34,832db-prop:birthdate 70,630db-prop:birthDate 101,121

Recommend standard predicates<db-onto:birthDate, rdfs:domain, db-onto:Person>“db-onto:birthDate” has the highest frequency of usage

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 27

Page 28: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Comparison with Previous Work

Compare our ontology integration approach with the mid-ontologyapproach [Zhao, et al., JIST2011].

Mid-Ontology approach Our approach

A hub data for data collection. No hub data.String-based similarity measuresfor all types of objects.

Different similarity measures fordifferent types of objects.

105 predicates in 22 groups. 367 predicates into 38 groups.No classes 97 classes into 48 groups

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 28

Page 29: Graph-based Ontology Analysis in the Linked Open Data

Introduction Related Work Our Approach Experiments Comparison with Previous Work Conclusion and Future Work

Conclusion and Future Work

ConclusionIntegrate heterogeneous ontologies from various data sets.Identify the characteristics of graph patterns using the integratedontology classes.Recommend standard predicates using the integrated ontologypredicates.Reduce the heterogeneity of ontologies.Construct an integrated ontology without learning the entire ontologyschema.

Future Work

Use more data sets in the LOD cloud.Apply MapReduce method to solve scalability and ontologyheterogeneity problem.

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 29

Page 30: Graph-based Ontology Analysis in the Linked Open Data

Questions?

Lihua Zhao, [email protected] Ichise, [email protected]

Lihua Zhao, Ryutaro Ichise | Graph-based Ontology Analysis in the Linked Open Data | 30