semantic web - sharif university of...

43
ی ل عا ت ه م بسSemantic Web Morteza Amini Ontology Alignment Sharif University of Technology Spring 91-92

Upload: others

Post on 22-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

بسمه تعالی

Semantic Web

Morteza Amini

Ontology Alignment

Sharif University of Technology Spring 91-92

Page 2: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Outline

The Problem of Ontologies

Ontology Alignment Overall Process

Ontology Heterogeneity

Similarity Methods

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 2

Page 3: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Outline

The Problem of Ontologies

Ontology Heterogeneity

Ontology Alignment Overall Process

Similarity Methods

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 3

Page 4: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

The Problem

Like the Web, the Semantic Web by

design will be distributed and

heterogeneous.

Ontology is used in it to support

interoperability and common

understanding between different parties.

Ontologies themselves may have some

heterogeneities.

Ontology Alignment is needed to find

semantic relationships among entities of

ontologies.

How should I use them? !!!

?

? ? ?

?

? ? d c

b

a

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 4

Page 5: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Need for Ontology Merging

There is significant overlap in existing ontologies

Yahoo! and DMOZ Open Directory

Product catalogs for similar domains

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 5

Page 6: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Terminology (1)

Mapping: a formal expression that states the semantic

relationship between two entities belonging to different

ontologies.

Given two ontologies O1 and O2, mapping one ontology onto

another means that for each entity (concept C, relation R, or

instance I) in ontology O1, we try to find a corresponding entity,

which has the same intended meaning, in ontology O2.

map(e1i) = e2j

Ontology Alignment: a set of correspondences between

two or more (in case of multi-alignment) ontologies. These

correspondences are expressed as mappings.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 6

Page 7: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Terminology (2)

Ontology Transformation: a general term for referring to any process which leads to a new ontology O0 from an ontology O by using a transformation function T.

Ontology Translation: an ontology transformation function t for translating an ontology O written in some language L into an ontology O’ written in a distinct language L’.

Ontology Merging: the creation of a new ontology from two (possibly overlapping) source ontologies. This concept is closely related to that of integration in the database community.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 7

Page 8: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

An Example of Ontology Alignment

FastAli’s

Peugeot

VehicleHas

Specification

Speed

250

km/h

Peugeot 405

Has

SpeedCar

Speed

Ali

Owner

Boat

Thing

Automobile

Object

Vehicle

Has

Owner

1.0

0.6

0.6

0.8

Car – Automobile

Label Similarity = 0.0

Super Similarity = 1.0

Instance Similarity = 0.6

Relation Similarity = 0.8

Total Similarity = 0.6

Concept

Property

Instance

Type

Similarity

Car : Ontology A ( similar to ) Automobile : Ontology B

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 8

Page 9: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

An Example of Ontology Merging

Family Car

Porsche

Sport Car

Automobile

Thing Object

Luxury Car Family Car

Sport Car

Vehicle

Car Bus

BMW

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 9

Page 10: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

An Example of Ontology Merging

Object

Luxury Car Family Car

Sport Car

Family Car Sport Car

Automobile

Thing

Vehicle

Car Bus

Porsche

BMW

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 10

Page 11: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

An Example of Ontology Merging

Object, Thing

Luxury Car Family Car Sport Car

Vehicle

Car, Automobile Bus

Porsche BMW

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 11

Page 12: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Outline

The Problem of Ontologies

Ontology Heterogeneity

Ontology Alignment Overall Process

Similarity Methods

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 12

Page 13: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Forms of Heterogeneity in Ontologies (1)

(1) Syntactic: depend on the choice of the representation OWL, RDFS, DAML, N3, DATALOG, PROLOG, …

(2) Terminological: all forms of mismatches that are related to the process of naming the entities (e.g. individuals, classes, properties, relations) that occur in an ontology.

Typical Examples:

different words are used to name the same entity (synonymy);

the same word is used to name different entities (polysemy);

words from different languages (English, French, etc.) are used to name entities;

syntactic variations of the same word (different acceptable spellings, abbreviations, use of optional prefixes or suffixes, etc.).

Mismatches at the terminological level are not as deep as those occurring at the conceptual level. However, Most real cases have to do with the terminological level (e.g., with the way different people name the same entities), and therefore this level is at least as crucial as the other one.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 13

Page 14: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Forms of Heterogeneity in Ontologies (2)

(3) Conceptual: we encounter mismatches which have to do

with the content of an ontology.

Metaphysical differences: which have to do with how the world

is “broken into pieces”.

Coverage: cover different portions – possibly

overlapping– of the world.

Granularity: One ontology provides a more (or less)

detailed description of the same entities.

Perspective: an ontology may provide a viewpoint, which

is different from the viewpoint adopted in another

ontology.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 14

Page 15: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Forms of Heterogeneity in Ontologies (3)

Metaphysical differences:

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 15

Page 16: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Overcoming Heterogeneity

One common approach to the problems of heterogeneity is the

definition of relations (mappings) across the heterogeneous

representations.

These relations can be used for transforming expression of one

ontology into a form compatible with that of the other.

This may happen at any level:

syntactic: through semantic-preserving transducers;

terminological: through functions mapping lexical information;

conceptual: through general transformation of the representations.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 16

Page 17: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Structure of Mappings

Alignment: a process that starts from two representations O and O’ and produces a set of mappings between pairs of (simple or complex) entities <e, e’> belonging to O and O’ respectively.

Intuitively, we will assume that in general a mapping can be described as a quadruple: <e, e’, n, R>

e and e’ are the entities between which a relation is asserted by the mapping.

n is a degree of trust (confidence) in that mapping.

R is the relation associated to a mapping, where R identifies the relation holding between e and e’.

simple set-theoretic relation

a fuzzy relation

a probabilistic distribution over a complete set of relations

a similarity measure

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 17

Page 18: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Finding Mappings Through Similarity

There are many ways to assess the similarity between two

entities. The most common way amounts to defining a measure

of this similarity.

The characteristics which can be asked from these measures:

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 18

Page 19: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Outline

The Problem of Ontologies

Ontology Heterogeneity

Ontology Alignment Overall Process

Similarity Methods

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 19

Page 20: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Ontology Alignment Process

Iterations

Input Output

Features Similarity Aggregation Interpretation Entity Pair

Selection

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 20

Page 21: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Features

Object

Vehicle

Car Boat

hasOwner

Owner Speed hasSpeed

Porsche KA-123 Marc 250 km/h

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 21

Page 22: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Similarity Measure

)),min(

),(),min(,0max(),(

21

2121

21ss

ssedsssssimString

String similarity: string comparisons e.g. labels.

E.g.,

Object similarity: direct object comparisons. Are two objects the same?

E.g., instances.

Set similarity: set comparisons. Are the two sets of objects the same?

E.g., concepts (based on their instances).

Set similarity requires a precalculated similarity of the objects based on object similarity method.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 22

Page 23: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Similarity Rules

Feature Similarity Measure

Concepts label String Similarity

subclassOf Object Similarity

instances Set Similarity

Relations instances Set Similarity

Instances ... Object Similarity

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 23

Page 24: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Aggregation

k

kk fesimwfesim ),(),(

How are the individual similarity measures combined?

Linearly

Weighted

Special Function

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 24

Page 25: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Interpretation

From similarities to mappings.

A threshold can be applied on the similarity (measured in

the previous step) to determine the required mapping.

map(e) = f if sim(e ,f)>t

The threshold can be determined through test (training)

data sets.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 25

Page 26: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Outline

The Problem of Ontologies

Ontology Heterogeneity

Ontology Alignment Overall Process

Similarity Methods

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 26

Page 27: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Similarity Methods

Local Methods

Having local view to compute similarities.

Global Methods

Having global view to compute similarities and merge

computed local similarities.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 27

Page 28: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Similarity – Local Methods

Terminological Methods

String Based Methods

Language Based Methods

Structural Methods

Internal Structure

External Structure

Extensional (based on instances) Methods

When the classes share the same instances

When they do not

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 28

Page 29: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Terminological Methods

Terminological methods compare strings.

Can be applied to: name,

label

comments concerning entities

URI

Take advantage of the structure of the string (as a sequence of letter).

The main idea in using such measures is the fact that usually similar entities have similar names and descriptions in different ontologies.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 29

Page 30: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Terminological Methods - Normalization

There are a number of normalization procedures that help improving the results of subsequent comparison:

Case normalization: consists of converting each alphabetic character in the strings in their down case counterpart;

Diacritics suppression: replacing characters with diacritic signs with their most frequent replacement (replacing Montréal with Montreal);

Blank normalization: Normalizing all blank characters (blank, tabulation, carriage return) into a single blank character;

Link stripping: normalizing some links between words, e.g., replacing apostrophes and blank underline into dashes;

Stopword elimination: eliminates words that can be found in a list (usually like, “to”, “a". . . ).

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 30

Page 31: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Terminological Methods - String Based

Substring Similarity

Hamming Distance

N-Gram Distance

Edit Distance

Jaro Similarity

Token Based Distances

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 31

Page 32: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Terminological Methods - String Based

In string edit distance, the operations usually considered are insertion of a character, replacement of a character by another and deletion of a character.

Levenstein Distance is an Edit Distance with all costs to 1.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 32

Page 33: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Terminological Methods – Language Based

Rely on using NLP techniques to find associations between instances of concepts or classes.

Intrinsic methods: perform the terminological matching with the help of morphological and syntactic analysis to perform term normalization. (Stemming) : going go

Extrinsic methods: make use of external resources such as dictionaries and lexicons (Wordnet). Resnik Semantic Similarity

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 33

Page 34: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Structural Methods

The structure of entities that can be found in ontology can be compared, instead of comparing their names or identifiers.

Internal Structure: use criteria such as the range of their properties (attributes and relations), their cardinality, and the transitivity and/or symmetry of their properties to calculate the similarity between them.

External Structure: The similarity comparison between two entities from two ontologies can be based on the position of entities within their hierarchies.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 34

Page 35: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Structural Methods – External (1)

If two entities from two ontologies are similar, their neighbors might also be somehow similar.

Criteria for deciding that the two entities are similar include:

Their direct super-entities are already similar.

Their sibling-entities are already similar.

Their direct sub-entities are already similar.

All (or most) of their descendant-entities (entities in the sub tree rooted at the entity in question) are already similar.

All (or most) of their leaf-entities are already similar.

All (or most) of entities in the paths from the root to the entities in question are already similar.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 35

Page 36: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Structural Methods – External (2)

Some existing Approaches:

Structural topological dissimilarity on hierarchies

Upward Cotopic Distance

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 36

Page 37: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Extensional (based on instances) Methods

Compares the extension of classes, i.e., their set of instances rather than

their interpretation.

Conditions in which such techniques can be used:

When the classes share the same instances

When they do not

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 37

Page 38: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Similarity – Global Methods

After calculation of local similarity, it is remain to compute the

alignment. This involve some kind of more global treatments,

including:

aggregating the results of these base methods in order to compute

the similarity between compound entities

organizing the combination of various similarity / alignment

algorithms

involving the user in the loop

finally extracting the alignments (mappings) from the resulting

(dis)similarity

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 38

Page 39: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Compound Similarity

Some existing approaches:

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 39

Page 40: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Users Feed Back

The support of effective interaction of the user with the

system components is one concern of ontology

alignment.

User input can take place in many areas of alignment:

Assessing initial similarity between some terms;

Invoking and composing alignment methods;

Accepting or refusing similarity or alignment provided by the

various methods.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 40

Page 41: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Alignment Extraction

The ultimate alignment goal is a satisfactory set of

correspondences (mappings) between ontologies.

Manual Extraction: Display the entity pairs with their

similarity scores and/or ranks and leaving the choice of the

appropriate pairs up to the user of the alignment tool.

Automatic Extraction: Using Thresholds

Hard threshold: retains all the correspondence above threshold n.

Delta method: using the highest similarity value to which a

particular constant value d is subtracted as a threshold.

Proportional method: using the a percentage of the highest

similarity value as a threshold.

Percentage: retains the n% correspondences above the others.

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 41

Page 42: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Existing Works

Method Year Organization Project Leader

Automatic

Features

Ag

greg

atio

n

Lexical

Str

uctu

re

Str

ing

Sem

an

tic

In

sta

nce

OntoMorph 1997 S. California Chalupsky Semi T

U.S. Army 1999 DARPA Semi T

Smart 1999 Sanford Fridman, Noy Semi T T

Chimaera 1999 Stanford McGuinness Semi T T T

Prompt 2001 Stanford Noy, Musen Semi T T

InfoSlueth 2001 Amsterdam Ding Semi T T

A. Prompt 2002 Stanford Noy, Musen Semi T T T

Glue 2002 Illinois Doan Automatic T T T T

IF Map 2003 Southampton Kafoglou Automatic T T

NOM 2003 Karlsruhe Ehric Automatic T T T T T

QOM 2004 Karlsruhe Ehric Automatic T T T T

CROSI 2005 Southampton Kafoglou Automatic T T T

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 42

Page 43: Semantic Web - Sharif University of Technologyce.sharif.ir/courses/91-92/2/.../root/LectureNotes/... · Like the Web, the Semantic Web by design will be distributed and heterogeneous

Any Question...

[email protected]

Sharif Univ. of Tech. Ontology Alignment - Morteza Amini 43