an empirical study of instance-based ontology mapping

29
An Empirical Study of Instance-Based Ontology Mapping Antoine Isaac, Lourens van der Meij, Stefan Schlobach, Shenghui Wang STITCH@CATCH funded by NWO Vrije Universiteit Amsterdam Koninklijke Bibliotheek Den Haag Max Planck Instutute Nijmegen

Upload: clarke

Post on 08-Jan-2016

51 views

Category:

Documents


2 download

DESCRIPTION

An Empirical Study of Instance-Based Ontology Mapping. Antoine Isaac, Lourens van der Meij, Stefan Schlobach , Shenghui Wang STITCH@CATCH funded by NWO Vrije Universiteit Amsterdam Koninklijke Bibliotheek Den Haag Max Planck Instutute Nijmegen. Metamotivation. Ontology mapping in practise - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Empirical Study of Instance-Based Ontology Mapping

An Empirical Study of Instance-Based Ontology Mapping

Antoine Isaac, Lourens van der Meij, Stefan Schlobach, Shenghui Wang

STITCH@CATCH funded by NWO

Vrije Universiteit AmsterdamKoninklijke Bibliotheek Den HaagMax Planck Instutute Nijmegen

Page 2: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Metamotivation

• Ontology mapping in practise • Based on real problems in the host institution at the Dutch

Royal Library

• Task-driven • Annotation support• Merging of thesauri

• Real thesauri (100 years of tradition)• Really messy• Conceptually difficult• Inexpressive

• Generic Solutions to Specific Questions & Tasks• Using Semantic Web Standards (SKOSification)

Page 3: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Overview

• Use-case• Instance-based mapping• Evaluation• Experiments• Results• Conclusions

Page 4: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

The Alignment Task: Context

• National Library of the Netherlands (KB)• 2 main collections

• Legal Deposit: all Dutch printed books• Scientific Collections: history, language…

• Each described (indexed) by its own thesaurus

ScientificCollection

Depot

1.4Mbooks

1Mbooks

GTT Brinkman

Page 5: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

A need for thesaurus mapping

• The KB wants • (Scenario 1) Possibly discontinue one of both

annotation and retrieval methods.• (Scenario 2) Possibly merge the thesauri

• We try to explore mapping• (Task 1) In case of single/new/merged retrieval

system, find books annotated with old system, facilitated by using mappings

• (Task 2) Candidate terms for merged thesaurus

• We make use of the doubly annotated corpus to calculate Instance-Based mappings

Page 6: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Overview

• Use-case• Instance-based mapping• Evaluation• Experiments• Results• Conclusions

Page 7: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Calculating mappings using Concept Extensions

how much are they related?

Page 8: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Standard approach (Jaccard)

• Use co-occurrence measure to calculate similarity between 2 concepts: e.g.

B GElements of B

Elements of G

Joint Elements

Similarity = 5/9 = 55 % (overlap, e.g. Degree of Greenness )Similarity = 1/7 = 14 % (overlap, e.g. Degree of Greenness )

Set of books in the library

Page 9: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Issues with this measure (sparse data)• What is more reliable?

• We need • more reliable measures • Or thresholds (at least n doubly annotated books)

Or ?

Jacc = 18/21 = 86 %

Jacc = 1/1 = 100 %

The second solution is worse: bB = {MemberOfParliament} and bG = {Cricket}

Page 10: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Issue with measure (hierarchy):

B G

Non hierarchical

Set of books in the library

·

Hierarchical Elements

B’Jacc(B’,G) = ½ = 50%

Jacc(B’,G) = 2/6 = 33%

Consider a hierarchy

Page 11: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

An empirical study of instance-based OM

• We experimented with three dimensions

Sim

ilarit

y m

easu

re

Threshold

Hierarchy

Jaccard

Corrected Jaccard

Pointwise Mutual Information

Log Likelihood Ratio

Information Gain

0

10

Yes

No

Why only 2 thresholds? Because of evaluation costs!

Page 12: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Overview

• Use-case• Instance-based mapping• Evaluation• Experiments• Results• Conclusions

Page 13: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Evaluation: building a gold standard

GTTBrinkman

Possible Thesaurus relations (~ SKOS)

Page 14: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

User Evaluation Statistics

• 3 evaluators with 1500 evaluations• 90% agreement ONLYEQ• If some evaluator says "equivalent", 73% of

other evaluators say the same• Comparing two evaluators, correspondence in

assignment is best for equivalence, followed by "No Link", "Narrower than", "Broader than", at or above 50% agreement, "Related To" has 35% agreement.

• There are correlations between evaluators.• For example, Ev1 and Ev2 agreed much more on

saying that there is no link than the Ev3.

Page 15: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Evaluation Interpretation: What is a good mapping?

• Is use case specific. We considered:• ONLYEQ: Only Equivalent answer → correct• NOTREL: EQ, BT,NT → correct• ALL: EQ, BT, NT, RT → correct

ONLYEQ NOTREL ALL

• The question is obviously: do they produce the same results

Page 16: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Evaluation: validity of the (different) methods

Answer is: yes

All evaluations produce the same results (in different scales)

Page 17: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

A remark about Evaluation

• Use of mappings strongly task dependant • Scenario 1 (legacy data/annotation support) and

Scenario 2 (thesaurus merging) require different mappings.

• Our evaluation is useful (correct) for Scenario 2 (intensional)

• Scenario 1 can be evaluated differently (e.g. cross-validation on test-data)

• See our paper at the Cultural Heritage Workshop.

Page 18: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Overview

• Use-case• Instance-based mapping• Evaluation• Experiments• Results• Conclusions

Page 19: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Experiments: Setup, Data and Thesauri

• We calculated • 5 different similarity measures with• Threshold: 0 and 10• Hierarchy: yes or no.

• Based on on • 24.061 GTT concepts with • 4.990 Brinkman concepts based on • 243.886 books with double annotations

Page 20: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Experiments: Result calculation

• Average precision at similarity position i:• Pi = Ngood,i/Ni (take the first i mappings, and return the percentage of correct ones)

Example:

This means that from the first 798 mappings 86% were correct

• Recall is estimated based on lexical mappings• F-measure is calculated as usual

100%

798th mapping

86 %

Page 21: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Overview

• Use-case• Instance-based mapping• Evaluation• Experiments• Results• Conclusions

Page 22: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Results: Three research questions

1. What is the influence of the choice of threshold?

2. What is the influence of hierarchical information?

3. What is the best measure and setting for instance-based mapping?

Page 23: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

What is the influence of the choice of threshold?

Threshold needed for Jaccard

Threshold NOT needed for LLR

Page 24: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

What is the influence of hierarchical information?

Results are inconclusive!

Page 25: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Best measure and setting for instance-based mapping?

10

We have two winners!

The corrected Jaccard measures

Page 26: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Conclusion

• Summary• About 80% precision at estimated 80% recall• Simple measures perform better, if statistical

correction applied, (threshold or explicit statistical correction)

• Hierarchical aspects unresolved• Some measures really unsuited

• Future work: • Generalize results

• Other use cases, web directories, …

• Study other measures

Page 27: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Thank you.

Page 28: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Similarity measures Formulae

• Jaccard:

• Corrected Jaccard: assign a smaller score to less frequently co-occurring annotations.

Page 29: An Empirical Study of Instance-Based Ontology Mapping

ISWC 2007

Information Theoretic Measures

• Pointwise Mutual Information:• Measures the reduction of

uncertainty that the annotation of one concept yields for the annotation with another concept.

• -> disadvantage: inadequate for spare data

• LogLikelihoodRatio:

• Information Gain:• Information gain is the difference in entropy,• determine the attribute that distinguishes best between positive an

negative example