discovering missing background knowledge in ontology matching

31
Discovering Missing Background Knowledge in Ontology Matching Pavel Shvaiko 17th European Conference on Artificial Intelligence (ECAI’06) 30 August 2006, Riva del Garda, Italy joint work with Fausto Giunchiglia and Mikalai Yatskevich

Upload: min

Post on 08-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Discovering Missing Background Knowledge in Ontology Matching. Pavel Shvaiko. joint work with Fausto Giunchiglia and Mikalai Yatskevich. 17th European Conference on Artificial Intelligence (ECAI’06) 30 August 2006, Riva del Garda, Italy. Introduction Semantic Matching Lack of Knowledge - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Discovering Missing Background Knowledge in Ontology Matching

Discovering Missing Background Knowledge in Ontology Matching

Pavel Shvaiko

17th European Conference on Artificial Intelligence (ECAI’06)

30 August 2006, Riva del Garda, Italy

joint work with Fausto Giunchiglia and Mikalai Yatskevich

Page 2: Discovering Missing Background Knowledge in Ontology Matching

2

ECAI, 30 August 2006, Riva del Garda, Italy

Outline

Introduction

Semantic Matching

Lack of Knowledge

Iterative Semantic Matching

Evaluation

Conclusions and Future Work

Page 3: Discovering Missing Background Knowledge in Ontology Matching

3

ECAI, 30 August 2006, Riva del Garda, Italy

IntroductionInformation sources (e.g., ontologies) can be viewed as graph-like structures containing terms and their inter-relationshipsMatching takes two graph-like structures and produces a mapping between the nodes of the graphs that correspond semantically to each other

Page 4: Discovering Missing Background Knowledge in Ontology Matching

4

ECAI, 30 August 2006, Riva del Garda, Italy

Semantic Matching

Page 5: Discovering Missing Background Knowledge in Ontology Matching

5

ECAI, 30 August 2006, Riva del Garda, Italy

Semantic matchingSemantic Matching: Given two graphs G1 and G2, for any node n1i G1, find the strongest semantic relation R’ holding with node n2j G2

Computed R’s, listed in the decreasing binding strength order:equivalence { = }

more general/specific { , }disjointness { }I don’t know {idk}

We compute semantic relations by analyzing the meaning (concepts, not labels) which is codified in the elements and the structures of ontologies

Technically, labels at nodes written in natural language are translated into propositional logical formulas which explicitly codify the labels’ intended meaning. This allows us to codify the matching problem into a propositional validity problem

Page 6: Discovering Missing Background Knowledge in Ontology Matching

6

ECAI, 30 August 2006, Riva del Garda, Italy

Concept of a label & concept of a node

Hobbies and Interests

Music

Top

Books

Entertainment

1

2 3

4 5

Concept of a label is the propositional formula which stands for the set of documents that one would classify under a label it encodesConcept at a node is the propositional formula which represents the set of documents which one would classify under a node, given that it has a certain label and that it is in a certain position in a tree

Page 7: Discovering Missing Background Knowledge in Ontology Matching

7

ECAI, 30 August 2006, Riva del Garda, Italy

1. For all labels in T1 and T2 compute concepts at labels 2. For all nodes in T1 and T2 compute concepts at nodes3. For all pairs of labels in T1 and T2 compute relations between

concepts at labels (background knowledge)4. For all pairs of nodes in T1 and T2 compute relations between

concepts at nodes

Steps 1 and 2 constitute the preprocessing phase, and are executed once and each time after the ontology is changed (OFF- LINE part)Steps 3 and 4 constitute the matching phase, and are executed every time two ontologies are to be matched (ON - LINE part)

Four macro steps

Given two labeled trees T1 and T2, do:

Page 8: Discovering Missing Background Knowledge in Ontology Matching

8

ECAI, 30 August 2006, Riva del Garda, Italy

Step 1: compute concepts at labelsThe idea

Translate labels at nodes written in natural language into propositional logical formulas which explicitly codify the labels’ intended meaning

PreprocessingTokenization. Labels (according to punctuation, spaces, etc.) are parsed into tokens. E.g., Hobbies and Interests <Hobbies, and, Interests>Lemmatization. Tokens are further morphologically analyzed in order to find all their possible basic forms. E.g., Hobbies HobbyBuilding atomic concepts. An oracle (WordNet) is used to extract senses of lemmas. E.g., Hobby has 3 sensesBuilding complex concepts. Prepositions, conjunctions are translated into logical connectives and used to build complex conceptsout of the atomic concepts

E.g., CHobbies_and_Interests = <Hobby, U(WNHobby)> <Interest, U(WNIterest)>, where U is a union of the senses that WordNet attaches to lemmas

Page 9: Discovering Missing Background Knowledge in Ontology Matching

9

ECAI, 30 August 2006, Riva del Garda, Italy

Step 2: compute concepts at nodes The idea

Extend concepts at labels by capturing the knowledge residing in a structure of a tree in order to define a context in which the given concept at a label occurs

Computation Concept at a node for some node n is computed as a conjunction of concepts at labels located above the given node, including the node itself

C2 = CTop CEntertainment

C4 = CTop (CHobbies CInterests) CBooks

Example

Hobbies and InterestsBooks

Top

Entertainment

1

2 3

4

Page 10: Discovering Missing Background Knowledge in Ontology Matching

10

ECAI, 30 August 2006, Riva del Garda, Italy

Step 3: compute relations between (atomic) concepts at labels

The ideaExploit a priori knowledge, e.g., lexical, domain knowledge, with the help of element level semantic matchers

Page 11: Discovering Missing Background Knowledge in Ontology Matching

11

ECAI, 30 August 2006, Riva del Garda, Italy

Step 3: Element level semantic matchers

Sense-based matchers have two WordNet senses in input and produce semantic relations exploiting (direct) lexical relations of WordNetString-based matchers have two labels in input and produce semantic relations exploiting string comparison techniques

Page 12: Discovering Missing Background Knowledge in Ontology Matching

12

ECAI, 30 August 2006, Riva del Garda, Italy

Step 4: compute relations between concepts at nodes

The ideaDecompose the graph (tree) matching problem into the set of node matching problemsTranslate each node matching problem, namely pairs of nodes with possible relations between them, into a propositional formulaCheck the propositional formula for validity

Page 13: Discovering Missing Background Knowledge in Ontology Matching

13

ECAI, 30 August 2006, Riva del Garda, Italy

Step 4: Example of a node matching task

?

Page 14: Discovering Missing Background Knowledge in Ontology Matching

14

ECAI, 30 August 2006, Riva del Garda, Italy

Lack of Knowledge

Page 15: Discovering Missing Background Knowledge in Ontology Matching

15

ECAI, 30 August 2006, Riva del Garda, Italy

Problem of low recall (incompletness) - I

FactsMatching has two components: element level matching and structure level matchingContrarily to many other systems, the S-Match structure level algorithm is correct and complete Still, the quality of results is not very good

Why? ... the problem of lack of knowledge

Example

Page 16: Discovering Missing Background Knowledge in Ontology Matching

16

ECAI, 30 August 2006, Riva del Garda, Italy

Problem of low recall (incompletness) - II

Preliminary (analytical) evaluationDataset [Avesani et al., ISWC’05]

Page 17: Discovering Missing Background Knowledge in Ontology Matching

17

ECAI, 30 August 2006, Riva del Garda, Italy

On increasing the recall: an overview

Multiple strategiesStrengthen element level matchers

Reuse of previous match results from the same domain of interest

PO = Purchase Order Use general knowledge sources (unlikely to help)

WWWUse, if available (!), domain specific sources of knowledge

UMLS

Page 18: Discovering Missing Background Knowledge in Ontology Matching

18

ECAI, 30 August 2006, Riva del Garda, Italy

Iterative Semantic Matching

Page 19: Discovering Missing Background Knowledge in Ontology Matching

19

ECAI, 30 August 2006, Riva del Garda, Italy

Iterative semantic matching (ISM)

The idea Repeat Step 3 and Step 4 of the matching algorithm for

some critical (hard) matching tasks

ISM macro steps

• Discover critical points in the matching process• Generate candidate missing axiom(s) • Re-run SAT solver on a critical task taking into

account the new axiom(s) • If SAT returns false, save the newly discovered

axiom(s) for future reuse

Page 20: Discovering Missing Background Knowledge in Ontology Matching

20

ECAI, 30 August 2006, Riva del Garda, Italy

ISM:Discovering critical points - Example

Google (T1) Looksmart (T2)

cLabsMatrix (result of Step 3) cNodesMatrix (result of Step 4)

Page 21: Discovering Missing Background Knowledge in Ontology Matching

21

ECAI, 30 August 2006, Riva del Garda, Italy

ISM: Generating candidate axioms

• Sense-based matchers have two WordNet senses in input and produce semantic relations exploiting structural properties of WordNet hierarchies

• Gloss-based matchers have two WordNet senses as input and produce relations exploiting gloss comparison techniques

Page 22: Discovering Missing Background Knowledge in Ontology Matching

22

ECAI, 30 August 2006, Riva del Garda, Italy

ISM: generating candidate axioms Hierarchy distance

Hierarchy distance returns the equivalence relation if the distance between two input senses in WordNet hierarchy is less than a given threshold value (e.g., 3) and Idk otherwise

Distance between these concepts is 2 (1 more general link and 1 less general). Thus, we can conclude that games and entertainment are close in their meaning and return the equivalence relation

diversion

entertainment

games

There is no direct relation between games and entertainment in WordNet

Page 23: Discovering Missing Background Knowledge in Ontology Matching

23

ECAI, 30 August 2006, Riva del Garda, Italy

Evaluation

Page 24: Discovering Missing Background Knowledge in Ontology Matching

24

ECAI, 30 August 2006, Riva del Garda, Italy

Testing methodologyDataset [Avesani et al., ISWC’05]

Measuring match qualityIndicators

Precision, [0,1]; Recall, [0,1]By construction in that dataset reference mappings represent only true positives, thus allowing us to estimate only recallHigher values of recall can be obtained at the expense of lower values of precision

Additional tests to ensure that precision does not decrease

Indicators

Page 25: Discovering Missing Background Knowledge in Ontology Matching

25

ECAI, 30 August 2006, Riva del Garda, Italy

Experimental results

Page 26: Discovering Missing Background Knowledge in Ontology Matching

26

ECAI, 30 August 2006, Riva del Garda, Italy

Conclusions and Future Work

Page 27: Discovering Missing Background Knowledge in Ontology Matching

27

ECAI, 30 August 2006, Riva del Garda, Italy

Conclusions

The problem of missing domain knowledge is a major problem of all (!) matching systems

This problem on the industrial size matching tasks is very hardWe have investigated it by examples of light weight ontologies, such as Google and YahooPartial solution by applying semantic matching iteratively

Page 28: Discovering Missing Background Knowledge in Ontology Matching

28

ECAI, 30 August 2006, Riva del Garda, Italy

Future work

Iterative semantic matchingNew element level matchers

Interactive semantic matchingGUICutomizing technology

Extensive evaluationTesting methodology Industry-strength tasks

Page 29: Discovering Missing Background Knowledge in Ontology Matching

29

ECAI, 30 August 2006, Riva del Garda, Italy

ReferencesProject website - KNOWDIVE: http://www.dit.unitn.it/~knowdive/F. Giunchiglia, P. Shvaiko: Semantic matching. Knowledge Engineering Review Journal, 18(3), 2003.F. Giunchiglia, P.Shvaiko, M. Yatskevich: Semantic schema matching. In Proceedings of CoopIS’05.P. Bouquet, L. Serafini, S. Zanobini: Semantic coordination: a new approach and an application. In Proceedings of ISWC, 2003.P. Avesani, F. Giunchiglia, M. Yatskevich: A large scale taxonomy mapping evaluation. In Proceedings of ISWC, 2005.C. Ghidini, F. Giunchiglia: Local models semantics, or contextual reasoning = locality + compatibility. Artificial Intelligence Journal, 127(3), 2001.Ontology Matching: http://www.OntologyMatching.orgP. Shvaiko and J. Euzenat: A survey of schema-based matching approaches. Journal on Data Semantics, IV, 2005.

Page 30: Discovering Missing Background Knowledge in Ontology Matching

30

ECAI, 30 August 2006, Riva del Garda, Italy

Thank you!

Page 31: Discovering Missing Background Knowledge in Ontology Matching

31

ECAI, 30 August 2006, Riva del Garda, Italy

• FN – False negatives• TP – True positives• FP – False positives• TN – True negatives

FN TP FP

TN

Reference Matches System Matches

.RecallPrecisionRecallPrecision2Measure-F

;Precision

1-2Recall Overall

; Recall

;Precision

TPFNTP

FPTPTP