leveraging data and structure in ontology integration

55
Leveraging Data and Structure in Ontology Integration Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University of Toronto

Upload: sienna

Post on 11-Jan-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Leveraging Data and Structure in Ontology Integration. Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University of Toronto. Contents. Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation. ILIADS. Goal: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Leveraging Data and Structure in Ontology Integration

Leveraging Data and Structure in Ontology Integration

Octavian Udrea1

Lise Getoor1

Renée J. Miller2

1University of Maryland College Park2University of Toronto

Page 2: Leveraging Data and Structure in Ontology Integration

Contents

Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation

Page 3: Leveraging Data and Structure in Ontology Integration

ILIADS Goal:

Produce high-quality integration via a flexible method able to adapt to a wide variety of ontology sizes and structures

Method: Combining statistical and logical inference Use schema (structure) and data (instances)

effectively Solution:

Integrated Learning In Alignment of Data and Schema (ILIADS)

Page 4: Leveraging Data and Structure in Ontology Integration

Contributions Show how to combine statistical and logical

inference effectively

Show that a small amount of inference yields high qualitative gain

Show that parameters needed to perform inference over data and structure are robust

Provide a thorough evaluation on 30 pairs of real-world ontologies (with ground truth)

Page 5: Leveraging Data and Structure in Ontology Integration

Contents

Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation

Page 6: Leveraging Data and Structure in Ontology Integration

Example OWL Lite ontologies

(discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty)(discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty)(resultsF rom, rdfs:subPropertyOf, associatedWith)

Page 7: Leveraging Data and Structure in Ontology Integration

Example OWL Lite ontologies

An entity can be a:• Class

(discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty)(discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty)(resultsF rom, rdfs:subPropertyOf, associatedWith)

Page 8: Leveraging Data and Structure in Ontology Integration

Example OWL Lite ontologies

An entity can be a:• Class• Instance

(discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty)(discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty)(resultsF rom, rdfs:subPropertyOf, associatedWith)

Page 9: Leveraging Data and Structure in Ontology Integration

Example OWL Lite ontologies

An entity can be a:• Class• Instance• Property

(discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty)(discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty)(resultsF rom, rdfs:subPropertyOf, associatedWith)

Page 10: Leveraging Data and Structure in Ontology Integration

Example OWL Lite ontologies

(discoveredBy, owl:inverseOf, discoverer)(discoveredBy, owl:type, owl:FunctionalProperty)(discoveredBy, owl:inverseOf, discoverer)(associatedWith, owl:type, owl:TransitiveProperty)(resultsF rom, rdfs:subPropertyOf, associatedWith)

Page 11: Leveraging Data and Structure in Ontology Integration

Inference in OWL Lite

Page 12: Leveraging Data and Structure in Ontology Integration

Inference in OWL Lite

Page 13: Leveraging Data and Structure in Ontology Integration

Inference in OWL Lite

Page 14: Leveraging Data and Structure in Ontology Integration

The integration problem

Page 15: Leveraging Data and Structure in Ontology Integration

The integration problem

Page 16: Leveraging Data and Structure in Ontology Integration

The integration problem

Page 17: Leveraging Data and Structure in Ontology Integration

The integration problem

Page 18: Leveraging Data and Structure in Ontology Integration

Contents

Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation

Page 19: Leveraging Data and Structure in Ontology Integration

State of the art Robust statistical methods

Well-known similarity measures Used for matching data (entities) and schema May use graph structure of schema

Logical inference Not combined with statistical inference Basis for most schema mapping and ontology

integration methods Approaches integrate schema, but not data

Page 20: Leveraging Data and Structure in Ontology Integration

Issues How to combine statistical inference with

logical inference Takes into account data, structure, etc. so it’s no

longer obvious In particular, how to quantify the results of logical

inference into a similarity-like form? How to do logical inference in a tractable

manner For OWL-Lite, EXPTIME-complete for the worst

case for the entire ontology

Page 21: Leveraging Data and Structure in Ontology Integration

The ILIADS algorithm

repeat until no more candidates

1. Compute local similarities

2. Select promising candidates

3. For each candidatea. Select relationship

b. Perform logical inference

c. Update score with the inference similarity

4. Select the candidate with the best score

end

Page 22: Leveraging Data and Structure in Ontology Integration

The ILIADS algorithm

repeat until no more candidates

1. Compute local similarities

2. Select promising candidates

3. For each candidatea. Select relationship

b. Perform logical inference

c. Update score with the inference similarity

4. Select the candidate with the best score

end

Page 23: Leveraging Data and Structure in Ontology Integration

Computing local similarities

simlexical: Jaro-Winkler and Wordnet

simstructural: Jaccard for neighborhoods

simextensional: Jaccard on extensions

parameters: λx, λs, λe

different for classes, instances and properties

)e(e,sim

)e(e,sim

)e(e,sim )esim(e,

extensione

structures

lexicalx

Page 24: Leveraging Data and Structure in Ontology Integration

The ILIADS algorithm

repeat until no more candidates

1. Compute local similarities

2. Select promising candidates

3. For each candidatea. Select relationship

b. Perform logical inference

c. Update score with the inference similarity

4. Select the candidate with the best score

end

Page 25: Leveraging Data and Structure in Ontology Integration

Selecting promising candidates

1. Select candidates with sim(e,e’) > λt

2. Use a policy based on entity type to order, e.g.:

Class alignments first Instance alignments firstAlternate between classes and instances

Page 26: Leveraging Data and Structure in Ontology Integration

The ILIADS algorithm

repeat until no more candidates

1. Compute local similarities

2. Select promising candidates

3. For each candidatea. Select relationship

b. Perform logical inference

c. Update score with the inference similarity

4. Select the candidate with the best score

end

Page 27: Leveraging Data and Structure in Ontology Integration

Selecting relationship Must decide on relation type

subClassOf vs. equivalentClass subPropertyOf vs. equivalentProperty

Determination is difficult, especially under the OWL open-world semantics

Use a simple extension based technique based on a threshold λr

Page 28: Leveraging Data and Structure in Ontology Integration

Selecting relationship

Page 29: Leveraging Data and Structure in Ontology Integration

Selecting relationship

Page 30: Leveraging Data and Structure in Ontology Integration

The ILIADS algorithm

repeat until no more candidates

1. Compute local similarities

2. Select promising candidates

3. For each candidatea. Represent candidate relationship

b. Perform logical inference

c. Update score with the inference similarity

4. Select the candidate with the best score

end

Page 31: Leveraging Data and Structure in Ontology Integration

Performing logical inference

For the candidate pair (e,e’): Select an axiom to apply The logical consequences are the pairs of

entities (e(i), e(j)) that have just become equivalent

Repeat a small number of times (5) to maintain tractability

Page 32: Leveraging Data and Structure in Ontology Integration

Performing logical inference

Page 33: Leveraging Data and Structure in Ontology Integration

Performing logical inference

Page 34: Leveraging Data and Structure in Ontology Integration

Performing logical inference

Page 35: Leveraging Data and Structure in Ontology Integration

Performing logical inference

(TheodorEscherich, owl:sameAs, T.S. Escherich) is a logical consequence of the candidate (E-ColiPoisoning, owl:sameAs, E-Coli)

Page 36: Leveraging Data and Structure in Ontology Integration

The ILIADS algorithm

repeat until no more candidates

1. Compute local similarities

2. Select promising candidates

3. For each candidatea. Represent candidate relationship

b. Perform logical inference

c. Update score with the inference similarity

4. Select the candidate with the best score

end

Page 37: Leveraging Data and Structure in Ontology Integration

Updating score

For the candidate pair (e,e’): Initial local similarity sim(e,e’) Inference similarity over all consequences:

Updated similarity:

e,esim-1

e,esim

(j)(i) e,e(j)(i)

(j)(i)

P

Pss *e'e,ime'e,imupdated

Page 38: Leveraging Data and Structure in Ontology Integration

Updating score

Page 39: Leveraging Data and Structure in Ontology Integration

Updating score

Page 40: Leveraging Data and Structure in Ontology Integration

Consistency The constructed alignment is not guaranteed

to be consistent ILIADS can only detect inconsistencies that

appear in the few logical inference steps Pellet used to check consistency after ILIADS

Experimentally, inconsistent ontologies in less than .5% of runs

Page 41: Leveraging Data and Structure in Ontology Integration

Contents

Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation

Page 42: Leveraging Data and Structure in Ontology Integration

Experimental framework 30 pairs of real-world ontologies

From 194 to over 20,000 triples From a variety of domains: medical, geographical,

economical, biological

Ground truth provided by human reviewers Multiple iterations to ensure the best human-

provided alignment

Datasets available: http://www.cs.umd.edu/linqs/projects/iliads

Page 43: Leveraging Data and Structure in Ontology Integration

Experimental framework Evaluation: precision, recall and F1 quality

F1 = 2 * Precision * Recall / (Precision + Recall) 7 independent runs

ILIADS Variations: ILIADS-tailored uses the best set of parameters

for each pair of ontologies ILIADS-fixed uses one set of parameters for all

pairs of ontologies Used to evaluate robustness of the parameters

Page 44: Leveraging Data and Structure in Ontology Integration

Experimental framework ILIADS compared to two leading tools:

FCA-merge [Stumme and Maedche, IJCAI 2001] uses formal concept analysis and an external

document corpus

COMA++ [Aumueller et al., SIGMOD 2005] implements multiple match strategies,

including fragment and reuse-based matching

Page 45: Leveraging Data and Structure in Ontology Integration

Precision/recall

Page 46: Leveraging Data and Structure in Ontology Integration

Precision/recall

Page 47: Leveraging Data and Structure in Ontology Integration

Precision/recall

Page 48: Leveraging Data and Structure in Ontology Integration

Precision/recall

Page 49: Leveraging Data and Structure in Ontology Integration

Precision/recall comparison

Page 50: Leveraging Data and Structure in Ontology Integration

Precision/recall for ontologies with substantial instance data

Page 51: Leveraging Data and Structure in Ontology Integration

Number of inference steps

Page 52: Leveraging Data and Structure in Ontology Integration

ILIADS parameters

ILIADS-fixed

.2 .4 .1 .5 .6 .4 .3 .5 .7 .2

Min ILIADS-tailored

.15 .4 0 .3 .45 .35 .2 .35 .65 .2

Max IILIADS-tailored

.25 .45 .1 .65 .7 .5 .35 .65 .7 .2

cx i

x px c

sis

ps

ce

pe t r

Lexical parameters

Structuralparameters

Extensionalparameters

Page 53: Leveraging Data and Structure in Ontology Integration

Choosing ILIADS parameters Despite the number of parameters, method is

quite robust Parameters are stable around the ILIADS-fixed

values if the two ontologies in a pair are not very different

Strong correlations between Structural similarity coefficients and the average

node degree Extensional coefficients and the ratio of instances

to classes

Page 54: Leveraging Data and Structure in Ontology Integration

False negative analysis

Page 55: Leveraging Data and Structure in Ontology Integration

Concluding remarks New ontology integration algorithm

First to combine statistical and logical inference

Evaluated feasibility of combined inference Small number of logical inference steps are sufficient

for integration decisions Inference is stable to parameter settings Parameters permit principled tuning based on

ontology characteristics

Dataset and code available at:http://www.cs.umd.edu/linqs/projects/iliads