leveraging data and structure in ontology integration

Leveraging Data and Structure in Ontology Integration

Octavian Udrea1

Lise Getoor1

Renée J. Miller2

1University of Maryland College Park2University of Toronto

Contents

Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation

ILIADS Goal:

Produce high-quality integration via a flexible method able to adapt to a wide variety of ontology sizes and structures

Method: Combining statistical and logical inference Use schema (structure) and data (instances)

effectively Solution:

Integrated Learning In Alignment of Data and Schema (ILIADS)

Contributions Show how to combine statistical and logical

inference effectively

Show that a small amount of inference yields high qualitative gain

Show that parameters needed to perform inference over data and structure are robust

Provide a thorough evaluation on 30 pairs of real-world ontologies (with ground truth)

Contents

Example OWL Lite ontologies

(discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty)(discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty)(resultsF rom, rdfs:subPropertyOf, associatedWith)

An entity can be a:• Class

An entity can be a:• Class• Instance

An entity can be a:• Class• Instance• Property

(discoveredBy, owl:inverseOf, discoverer)(discoveredBy, owl:type, owl:FunctionalProperty)(discoveredBy, owl:inverseOf, discoverer)(associatedWith, owl:type, owl:TransitiveProperty)(resultsF rom, rdfs:subPropertyOf, associatedWith)

Inference in OWL Lite

The integration problem

Contents

State of the art Robust statistical methods

Well-known similarity measures Used for matching data (entities) and schema May use graph structure of schema

Logical inference Not combined with statistical inference Basis for most schema mapping and ontology

integration methods Approaches integrate schema, but not data

Issues How to combine statistical inference with

logical inference Takes into account data, structure, etc. so it’s no

longer obvious In particular, how to quantify the results of logical

inference into a similarity-like form? How to do logical inference in a tractable

manner For OWL-Lite, EXPTIME-complete for the worst

case for the entire ontology

The ILIADS algorithm

repeat until no more candidates

1. Compute local similarities

2. Select promising candidates

3. For each candidatea. Select relationship

b. Perform logical inference

c. Update score with the inference similarity

4. Select the candidate with the best score

Computing local similarities

simlexical: Jaro-Winkler and Wordnet

simstructural: Jaccard for neighborhoods

simextensional: Jaccard on extensions

parameters: λx, λs, λe

different for classes, instances and properties

)e(e,sim

)e(e,sim )esim(e,

extensione

structures

lexicalx

Selecting promising candidates

1. Select candidates with sim(e,e’) > λt

2. Use a policy based on entity type to order, e.g.:

Class alignments first Instance alignments firstAlternate between classes and instances

Selecting relationship Must decide on relation type

subClassOf vs. equivalentClass subPropertyOf vs. equivalentProperty

Determination is difficult, especially under the OWL open-world semantics

Use a simple extension based technique based on a threshold λr

Selecting relationship

3. For each candidatea. Represent candidate relationship

Performing logical inference

For the candidate pair (e,e’): Select an axiom to apply The logical consequences are the pairs of

entities (e(i), e(j)) that have just become equivalent

Repeat a small number of times (5) to maintain tractability

Performing logical inference

(TheodorEscherich, owl:sameAs, T.S. Escherich) is a logical consequence of the candidate (E-ColiPoisoning, owl:sameAs, E-Coli)

3. For each candidatea. Represent candidate relationship

Updating score

For the candidate pair (e,e’): Initial local similarity sim(e,e’) Inference similarity over all consequences:

Updated similarity:

e,esim-1

e,esim

(j)(i) e,e(j)(i)

(j)(i)

Pss *e'e,ime'e,imupdated

Updating score

Consistency The constructed alignment is not guaranteed

to be consistent ILIADS can only detect inconsistencies that

appear in the few logical inference steps Pellet used to check consistency after ILIADS

Experimentally, inconsistent ontologies in less than .5% of runs

Contents

Experimental framework 30 pairs of real-world ontologies

From 194 to over 20,000 triples From a variety of domains: medical, geographical,

economical, biological

Ground truth provided by human reviewers Multiple iterations to ensure the best human-

provided alignment

Datasets available: http://www.cs.umd.edu/linqs/projects/iliads

Experimental framework Evaluation: precision, recall and F1 quality

F1 = 2 * Precision * Recall / (Precision + Recall) 7 independent runs

ILIADS Variations: ILIADS-tailored uses the best set of parameters

for each pair of ontologies ILIADS-fixed uses one set of parameters for all

pairs of ontologies Used to evaluate robustness of the parameters

Experimental framework ILIADS compared to two leading tools:

FCA-merge [Stumme and Maedche, IJCAI 2001] uses formal concept analysis and an external

document corpus

COMA++ [Aumueller et al., SIGMOD 2005] implements multiple match strategies,

including fragment and reuse-based matching

Precision/recall

Precision/recall comparison

Precision/recall for ontologies with substantial instance data

Number of inference steps

ILIADS parameters

ILIADS-fixed

.2 .4 .1 .5 .6 .4 .3 .5 .7 .2

Min ILIADS-tailored

.15 .4 0 .3 .45 .35 .2 .35 .65 .2

Max IILIADS-tailored

.25 .45 .1 .65 .7 .5 .35 .65 .7 .2

x px c

pe t r

Lexical parameters

Structuralparameters

Extensionalparameters

Choosing ILIADS parameters Despite the number of parameters, method is

quite robust Parameters are stable around the ILIADS-fixed

values if the two ontologies in a pair are not very different

Strong correlations between Structural similarity coefficients and the average

node degree Extensional coefficients and the ratio of instances

to classes

False negative analysis

Concluding remarks New ontology integration algorithm

First to combine statistical and logical inference

Evaluated feasibility of combined inference Small number of logical inference steps are sufficient

for integration decisions Inference is stable to parameter settings Parameters permit principled tuning based on

ontology characteristics

Dataset and code available at:http://www.cs.umd.edu/linqs/projects/iliads

leveraging data and structure in ontology integration

owl liteinference

tractable mannerfor

discoverer discoveredby

discoverer associatedwith

transitivepropertyresultsf

results of logical inference

highquality integration

alignment of data

Documents

ontology driven data integration in heterogeneous...

a framework uniting ontology-based geodata integration and...

8 ontology integration and interoperability (onto i op)

zack wolfson - leveraging the salesforce integration

leveraging data and structure in ontology integration · pdf...

ontology integration and interoperability (ontoiop) – part...

leveraging continuous integration for fun and profit!

use of the cim ontology - utility integration solutions

solution brief leveraging continuous integration, delivery,...

an ontology-driven integration framework for smart...

leveraging capabilities: the integration of special

debugit: ontology-mediated layered data integration...

data integration at the ontology engineering group

data integration ontology mapping

an ontology for quality standards integration

information integration using contextual knowledge and...

cross-repository data integration using ontology...

ontology integration using mappings: towards getting the...

leveraging internal systems and integration

ontology-based integration of cross-linked...