leveraging data and structure in ontology integration

Post on 11-Jan-2016

50 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Leveraging Data and Structure in Ontology Integration. Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University of Toronto. Contents. Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation. ILIADS. Goal: - PowerPoint PPT Presentation

TRANSCRIPT

Leveraging Data and Structure in Ontology Integration

Octavian Udrea1

Lise Getoor1

Renée J. Miller2

1University of Maryland College Park2University of Toronto

Contents

Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation

ILIADS Goal:

Produce high-quality integration via a flexible method able to adapt to a wide variety of ontology sizes and structures

Method: Combining statistical and logical inference Use schema (structure) and data (instances)

effectively Solution:

Integrated Learning In Alignment of Data and Schema (ILIADS)

Contributions Show how to combine statistical and logical

inference effectively

Show that a small amount of inference yields high qualitative gain

Show that parameters needed to perform inference over data and structure are robust

Provide a thorough evaluation on 30 pairs of real-world ontologies (with ground truth)

Contents

Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation

Example OWL Lite ontologies

(discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty)(discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty)(resultsF rom, rdfs:subPropertyOf, associatedWith)

Example OWL Lite ontologies

An entity can be a:• Class

(discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty)(discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty)(resultsF rom, rdfs:subPropertyOf, associatedWith)

Example OWL Lite ontologies

An entity can be a:• Class• Instance

(discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty)(discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty)(resultsF rom, rdfs:subPropertyOf, associatedWith)

Example OWL Lite ontologies

An entity can be a:• Class• Instance• Property

(discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty)(discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty)(resultsF rom, rdfs:subPropertyOf, associatedWith)

Example OWL Lite ontologies

(discoveredBy, owl:inverseOf, discoverer)(discoveredBy, owl:type, owl:FunctionalProperty)(discoveredBy, owl:inverseOf, discoverer)(associatedWith, owl:type, owl:TransitiveProperty)(resultsF rom, rdfs:subPropertyOf, associatedWith)

Inference in OWL Lite

Inference in OWL Lite

Inference in OWL Lite

The integration problem

The integration problem

The integration problem

The integration problem

Contents

Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation

State of the art Robust statistical methods

Well-known similarity measures Used for matching data (entities) and schema May use graph structure of schema

Logical inference Not combined with statistical inference Basis for most schema mapping and ontology

integration methods Approaches integrate schema, but not data

Issues How to combine statistical inference with

logical inference Takes into account data, structure, etc. so it’s no

longer obvious In particular, how to quantify the results of logical

inference into a similarity-like form? How to do logical inference in a tractable

manner For OWL-Lite, EXPTIME-complete for the worst

case for the entire ontology

The ILIADS algorithm

repeat until no more candidates

1. Compute local similarities

2. Select promising candidates

3. For each candidatea. Select relationship

b. Perform logical inference

c. Update score with the inference similarity

4. Select the candidate with the best score

end

The ILIADS algorithm

repeat until no more candidates

1. Compute local similarities

2. Select promising candidates

3. For each candidatea. Select relationship

b. Perform logical inference

c. Update score with the inference similarity

4. Select the candidate with the best score

end

Computing local similarities

simlexical: Jaro-Winkler and Wordnet

simstructural: Jaccard for neighborhoods

simextensional: Jaccard on extensions

parameters: λx, λs, λe

different for classes, instances and properties

)e(e,sim

)e(e,sim

)e(e,sim )esim(e,

extensione

structures

lexicalx

The ILIADS algorithm

repeat until no more candidates

1. Compute local similarities

2. Select promising candidates

3. For each candidatea. Select relationship

b. Perform logical inference

c. Update score with the inference similarity

4. Select the candidate with the best score

end

Selecting promising candidates

1. Select candidates with sim(e,e’) > λt

2. Use a policy based on entity type to order, e.g.:

Class alignments first Instance alignments firstAlternate between classes and instances

The ILIADS algorithm

repeat until no more candidates

1. Compute local similarities

2. Select promising candidates

3. For each candidatea. Select relationship

b. Perform logical inference

c. Update score with the inference similarity

4. Select the candidate with the best score

end

Selecting relationship Must decide on relation type

subClassOf vs. equivalentClass subPropertyOf vs. equivalentProperty

Determination is difficult, especially under the OWL open-world semantics

Use a simple extension based technique based on a threshold λr

Selecting relationship

Selecting relationship

The ILIADS algorithm

repeat until no more candidates

1. Compute local similarities

2. Select promising candidates

3. For each candidatea. Represent candidate relationship

b. Perform logical inference

c. Update score with the inference similarity

4. Select the candidate with the best score

end

Performing logical inference

For the candidate pair (e,e’): Select an axiom to apply The logical consequences are the pairs of

entities (e(i), e(j)) that have just become equivalent

Repeat a small number of times (5) to maintain tractability

Performing logical inference

Performing logical inference

Performing logical inference

Performing logical inference

(TheodorEscherich, owl:sameAs, T.S. Escherich) is a logical consequence of the candidate (E-ColiPoisoning, owl:sameAs, E-Coli)

The ILIADS algorithm

repeat until no more candidates

1. Compute local similarities

2. Select promising candidates

3. For each candidatea. Represent candidate relationship

b. Perform logical inference

c. Update score with the inference similarity

4. Select the candidate with the best score

end

Updating score

For the candidate pair (e,e’): Initial local similarity sim(e,e’) Inference similarity over all consequences:

Updated similarity:

e,esim-1

e,esim

(j)(i) e,e(j)(i)

(j)(i)

P

Pss *e'e,ime'e,imupdated

Updating score

Updating score

Consistency The constructed alignment is not guaranteed

to be consistent ILIADS can only detect inconsistencies that

appear in the few logical inference steps Pellet used to check consistency after ILIADS

Experimentally, inconsistent ontologies in less than .5% of runs

Contents

Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation

Experimental framework 30 pairs of real-world ontologies

From 194 to over 20,000 triples From a variety of domains: medical, geographical,

economical, biological

Ground truth provided by human reviewers Multiple iterations to ensure the best human-

provided alignment

Datasets available: http://www.cs.umd.edu/linqs/projects/iliads

Experimental framework Evaluation: precision, recall and F1 quality

F1 = 2 * Precision * Recall / (Precision + Recall) 7 independent runs

ILIADS Variations: ILIADS-tailored uses the best set of parameters

for each pair of ontologies ILIADS-fixed uses one set of parameters for all

pairs of ontologies Used to evaluate robustness of the parameters

Experimental framework ILIADS compared to two leading tools:

FCA-merge [Stumme and Maedche, IJCAI 2001] uses formal concept analysis and an external

document corpus

COMA++ [Aumueller et al., SIGMOD 2005] implements multiple match strategies,

including fragment and reuse-based matching

Precision/recall

Precision/recall

Precision/recall

Precision/recall

Precision/recall comparison

Precision/recall for ontologies with substantial instance data

Number of inference steps

ILIADS parameters

ILIADS-fixed

.2 .4 .1 .5 .6 .4 .3 .5 .7 .2

Min ILIADS-tailored

.15 .4 0 .3 .45 .35 .2 .35 .65 .2

Max IILIADS-tailored

.25 .45 .1 .65 .7 .5 .35 .65 .7 .2

cx i

x px c

sis

ps

ce

pe t r

Lexical parameters

Structuralparameters

Extensionalparameters

Choosing ILIADS parameters Despite the number of parameters, method is

quite robust Parameters are stable around the ILIADS-fixed

values if the two ontologies in a pair are not very different

Strong correlations between Structural similarity coefficients and the average

node degree Extensional coefficients and the ratio of instances

to classes

False negative analysis

Concluding remarks New ontology integration algorithm

First to combine statistical and logical inference

Evaluated feasibility of combined inference Small number of logical inference steps are sufficient

for integration decisions Inference is stable to parameter settings Parameters permit principled tuning based on

ontology characteristics

Dataset and code available at:http://www.cs.umd.edu/linqs/projects/iliads

top related