semantic enrichment of ontology mappings: a linguistic-based approach patrick arnold, erhard rahm...

23
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference on Advances in Databases and Information Systems

Upload: myron-price

Post on 31-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

Semantic Enrichment of Ontology Mappings:A Linguistic-based Approach

Patrick Arnold, Erhard RahmUniversity of Leipzig, Germany

17th East-European Conference on Advances in Databases and Information Systems

Page 2: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

204/19/23 Semantic Enrichment of Ontology Mappings

1. Introduction

Ontology Matching: Detecting corresponding concepts between two Ontologies O and O'

Most matching tools do not consider the relation type that holds between corresponding concepts

Page 3: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

304/19/23 Semantic Enrichment of Ontology Mappings

1. Introduction

Importance of Relation Types Ontology Merging and Ontology Evolutions

More precise results Effectively preventing false conclusions

Related fields Text Mining, Entity Resolution, Linked Data Key Question: Given two items, words etc.: What is the logical relation between

them?

Page 4: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

404/19/23 Semantic Enrichment of Ontology Mappings

1. Introduction - Example

Page 5: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

504/19/23 Semantic Enrichment of Ontology Mappings

1. Introduction

Some existing tools regarding relation types S-Match: Ineffective in our evaluations

Returned about 20,000 correspondences where around 400 were expected Further tools: LogMap, TaxoMap, etc.

Page 6: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

604/19/23 Semantic Enrichment of Ontology Mappings

Our Contributions

1. Introduction2. Semantic Enrichment Architecture3. Implemented Strategies4. Evaluation5. Outlook

Page 7: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

704/19/23 Semantic Enrichment of Ontology Mappings

2. Semantic Enrichtment Architecture

We provide a 2-step architecutre Step 1: Classic Ontology Matching Step 2: Enrichment

Page 8: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

804/19/23 Semantic Enrichment of Ontology Mappings

2. Semantic Enrichtment Architecture

Approach consists of 4 strategies Each strategy returns one of the following relation types:

equal is-a / inverse is-a part-of / has-a related undecided

Take the relation which was returned most In case of draw: User feedback required

If all strategies return undecided: Decide on equal by default

Page 9: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

904/19/23 Semantic Enrichment of Ontology Mappings

3. Implemented Strategies3.1 Compound Strategy

Compound: Two words A, B form a new word AB. Examples: high-school, blackbird, database conference

A is called the modifier, B is called the head

Compounds often express is-a relations (endocentric compounds)

High-school is a school Blackbird is a bird ...

Page 10: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

1004/19/23 Semantic Enrichment of Ontology Mappings

3. Implemented Strategies3.1 Compound Strategy

If there is a correspondence (AB, B) or (B, AB), we derive the is-a or inv. is-a relation

Example: (main memory, memory)

Problem: Exocentric compounds butterfly, redhead, computer mouse

Exocentric matches extremely rare in mappings If AB is an exocentric compound, there usually is no head B in the

opposite ontology Example: sawtooth – tooth

Page 11: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

1104/19/23 Semantic Enrichment of Ontology Mappings

3. Implemented Strategies3.1 Compound Strategy

Possibilities to reduce false conclusions Check modifier length: Must be at least 3

inroad – road Use dictionary to check the modifier

marriage – age nausea – sea holiday – day?

No solutions for “Pseudo-Compounds“ question – ion justice – ice

Page 12: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

1204/19/23 Semantic Enrichment of Ontology Mappings

3. Implemented Strategies3.2 Background Knowledge

WordNet for English-language scenarioes Reliable, extensive thesaurus Excellent precision, good recall Limited in domain-specific areas

Problem: Compounds Example: Vintage Car Repair Shop Very simple word, but not contained by WordNet

Page 13: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

1304/19/23 Semantic Enrichment of Ontology Mappings

3. Implemented Strategies3.2 Background Knowledge

Gradual Modifier Removal Remove modifiers gradually from the left After each removal: Check whether word is contained by WordNet Example: Vintage Car Repair Shop ↔ Company

WordNet: Repair Shop is a Company Vintage Car Repair Shop is a Company

Step Word In WordNet?

1 Vintage Car Repair Shop 2 Car Repair Shop 3 Repair Shop

Page 14: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

1404/19/23 Semantic Enrichment of Ontology Mappings

3. Implemented Strategies3.3 Itemization

Itemization: List of items (words or phrases) Most frequently in product taxonomies Examples:

Laptops and Computers Bikes, Scooters and Motorbikes

More complex: Need special treatment Itemization Strategy: Triggers if at least one concept is an itemization Exploits previous strategies Approach: Remove items from item sets

Goal: Empty set

Page 15: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

1504/19/23 Semantic Enrichment of Ontology Mappings

3. Implemented Strategies3.3 Itemization

Example Correspondence:

books, e-books, movies, films, cds novels and compact discs

Step 1: Build item sets{ books, e-books, movies, films, cds }

{ novels, compact discs }

Step 2: Intra-Synonym Removal{ books, e-books, movies, films, cds } In each item set, remove synonyms (A,B) by

crossing off either A or B.{ novels, compact discs }

Step 3: Intra-Hyponym Removal{ books, e-books, movies, cds } In each item set, remove existing

hyponyms.{ novels, compact discs }

Step 4: Inter-Synonym Removal{ books, movies, cds } Remove each synonym pair between the

two item sets.{ novels, compact discs }

Step 5: Intra-Hyponym Removal{ books, movies } Remove each word H to which a hypernym

H’ in the opposite item set exists.{ novels }

Step 6: Determine the Relation Type{ books, movies } Second item set more specific than first

one: Inverse is-a{ }

Page 16: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

1604/19/23 Semantic Enrichment of Ontology Mappings

3. Implemented Strategies3.4 Structure Strategy

Focus: Structured schemas (hierarchies)

Issue: A relation between two matching concepts X, Y cannot be derived

Check the relation between X' and Y resp. X and Y' Prime (') denotes father element

Page 17: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

1704/19/23 Semantic Enrichment of Ontology Mappings

3. Implemented Strategies3.4 Structure Strategy

Example:

Page 18: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

18

3. Implemented Strategies3.5 Subset Verification

In some cases, is-a relations only appear to be correct

Page 19: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

1904/19/23 Semantic Enrichment of Ontology Mappings

4. Evaluation

3 Benchmark Scenarios

Input: Perfect mapping without relation types Evaluation:

How many non-trivial relations were detected? (recall) How many of them were correct? (precision)

Scenario Domain / Traits Corresp. Non-trivial corresp.

1 Web Directories German language, product catalog 340 62

2 Diseases Health, medical domain 395 41

3 Text Mining Taxon. TM Taxonomy (Everyday Language) 762 692

Page 20: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

2004/19/23 Semantic Enrichment of Ontology Mappings

4. Evaluation

Evaluation (as of April 2013)

Evaluation against S-Match No reasonable evaluation feasible Scenario 1: Returned only 4 correspondences, all wrong Scenario 2: Returned 19,600 correspondences

3 % recall, precision close to 0 %

Recall Precision F-Measure

Web Directories 46.7 % 69.0 % 57.8 %

Health 58.5 % 80.0 % 69.2 %

TM Taxonomies 65.4 % 97.7 % 81.1 %

Page 21: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

2104/19/23 Semantic Enrichment of Ontology Mappings

4. Evaluation

Evaluating the Strategies

Recall Precision

Compound 18.9 % 82.2 %

Background Knowledge 19.6 % 94.0 %

Itemization 17.1 % 88.8 %

Structure 1.0 % 50.0 %

Page 22: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

2204/19/23 Semantic Enrichment of Ontology Mappings

5. Outlook

Relation types needed for different mapping tasks Two general approaches: Linguistic or background knowledge Linguistic Strategies

More generic and more error-prone Background Knowledge

Less generic and more precise

Improvements Exploit more background knowledge

Example: Yago Taxonomy, DBPedia, UMLS Combine it with linguistic / NLP technologies

Exploit further linguistic techniques

Page 23: Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference

23

Thank You