logics for data and knowledge representation semantic matching

36
Logics for Data and Knowledge Representation Semantic Matching

Upload: silvester-mcdonald

Post on 11-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logics for Data and Knowledge Representation Semantic Matching

Logics for Data and KnowledgeRepresentation

Semantic Matching

Page 2: Logics for Data and Knowledge Representation Semantic Matching

Outline Introduction:

Why matching? The matching problem

Kinds of matching: Syntactic versus Semantic Steps in matching Kinds of schemas

S-Match MinSMatch

2

Page 3: Logics for Data and Knowledge Representation Semantic Matching

Approaching the heterogeneity problem Knowledge can be represented using graphs. These graphs can be very different:

Different structure (RDBs, OODB, XML, thesauri, formal ontologies)

Different conceptualizations: they reflect different visions of the world

They contain different terminology and polysemous terms They have different degrees of specificity, scope and coverage They can be expressed in different languages

Heterogeneity of these graphs demands the exposition of relations between them, such as semantically equivalent.

3

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 4: Logics for Data and Knowledge Representation Semantic Matching

The Matching Problem Matching Problem: given two finite graphs, finds all nodes in

the two graphs that syntactically or semantically correspond to each other.

Given two graph-like structures (e.g., classifications, XML and database schemas, ontologies), a matching operator produces a mapping between the nodes of the graphs.

Solution: A possible solution consists in the conversion of the two graphs in input into lightweight ontologies and then matching them semantically.

4

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 5: Logics for Data and Knowledge Representation Semantic Matching

A Matching Problem

?

?

?

5

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 6: Logics for Data and Knowledge Representation Semantic Matching

Kinds of Matching: Syntactic versus Semantic (I) Syntactic matching

Matching of nodes as objects or strings (without meaning)

6

“Virus” partially matches with “Computer virus”“Wives” is an exceptional form for “Wife”

Semantic matching

Matching of nodes as concepts

“Car” is equivalent to “Automobile” Car ≡ Automobile“Dog” is more specific than “Animal” Dog ⊑ Animal

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 7: Logics for Data and Knowledge Representation Semantic Matching

Kinds of Matching: Syntactic versus Semantic (II) Syntactic matching

Relations are computed between labels at nodes.

The similarity is given in a range [0,1].

Most of the tools developed so far are syntactic.

7

Semantic matchingRelations are computed between concepts at nodes.

They return a semantic similarity, namely R {∈ , ≡, , , ⊑ ⊒ ⊓}, sometimes with a confidence value in the range [0,1].

There are a few tools of this kind, but they are recently increasing. They return different kinds of semantic relations.

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 8: Logics for Data and Knowledge Representation Semantic Matching

Steps in matching A matching problem can be decomposed in three steps:

1. Extract the graphs from the conceptual models under consideration;

2. Convert the graphs into formal ontologies

3. Match the formal ontologies

In the next slides we show some examples of step 1

8

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 9: Logics for Data and Knowledge Representation Semantic Matching

Relational DB Schemas Let us consider the following relational database (RDB)

model, say “BANK”:

9

We can represent the RDB model “BANK” as a graph (i.e. a tree) with root “BANK”.

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 10: Logics for Data and Knowledge Representation Semantic Matching

Relational DB Schemas: Representation #1 The RDB model is first partitioned into relations, then

attributes and data instances.

10

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 11: Logics for Data and Knowledge Representation Semantic Matching

Relational DB Schemas: Representation #2 The model is partitioned into relations, then into tuples,

attributes and data instances.

11

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 12: Logics for Data and Knowledge Representation Semantic Matching

Relational DB Schemas: notes Which of the two representations is more preferable

depends on the concrete task.

It is always possible to transform one representation into the other.

In contrast to the example of RDB “BANK”, DB schemas are seldom trees. More often, DB schemas are translated into Directed Acyclic Graphs (DAG’s) and then approximated into trees.

12

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 13: Logics for Data and Knowledge Representation Semantic Matching

OODB Schemas Let us consider the RDB “BANK” in terms of an object-oriented DB (OODB) schema:

BRANCH (Street, City, Zip) PERSON (F_Name, L_Name) STAFF : PERSON (Position, Salary, Manager)

The resulting graph is:

13

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 14: Logics for Data and Knowledge Representation Semantic Matching

OODB Schemas: notes OODB schemas capture more semantics than the RDBs.

In particular, an OODB schema: explicitly expresses subsumption relations between

elements; admits special types of arcs for part/whole relationships

in terms of aggregation and composition.

14

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 15: Logics for Data and Knowledge Representation Semantic Matching

Semi-structured Data Neither RDBs nor OODBs capture all the features of semi-

structured or unstructured data (Buneman, 1997): semi-structured data do not possess a regular structure

(schemaless); the “structure” of semi-structured data could be partial or

even implicit.

Typical examples are: HTML and XML.

15

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 16: Logics for Data and Knowledge Representation Semantic Matching

XML Schemas XML schemas can be represented as DAGs. The graph from the RDB “BANK” could also be obtained

from an XML schema.

16

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 17: Logics for Data and Knowledge Representation Semantic Matching

XML Schemas: notes Often XML schemas represent hierarchical data models.

In this case the only relationships between the elements are “is-a”.

Attributes in XML are used to represent extra information about data. There are no strict rules telling us when data should be represented as elements, or as attributes.

17

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 18: Logics for Data and Knowledge Representation Semantic Matching

Concept Hierarchies A concept hierarchy is a semi-formal conceptualization of an

application domain in terms of concepts and relationships.

18

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 19: Logics for Data and Knowledge Representation Semantic Matching

S-Match: the matching problem Semantic Matching

Given two graphs G and H, for any node ni G and mj H, find

the strongest semantic relation R holding between them The strength of the relations is given by a partial order, that is:

disjointness (), equivalence (≡), more/less specific ( , ), ⊑ ⊒overlap ( ).⊓

A mapping element is a 4-tuple <IDij, ni, mj , R>, where:

IDij is a unique identifier of the given mapping element;

ni is the i-th node of the first graph;

mj is the j-th node of the second graph;

R specifies a semantic relation between the concepts at the given nodes

A mapping is a set of mapping elements19

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 20: Logics for Data and Knowledge Representation Semantic Matching

Example

< ID22, 2, 2, ≡ >

< ID21, 2, 1, ⊑ >

< ID24, 2, 4, ⊒ >

4

Images

Europe

ItalyAustria

2

3 4

1

Italy

Europe

Wine and Cheese

Austria

Pictures

1

2 3

5

A mappingA mapping element

20

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 21: Logics for Data and Knowledge Representation Semantic Matching

S-Match: the algorithmFour Macro Steps: Given two labeled trees T1 and T2, do:1. For all labels in T1 and T2 compute concepts at labels 2. For all nodes in T1 and T2 compute concepts at nodes3. For all pairs of labels in T1 and T2 compute relations between

concepts at labels4. For all pairs of nodes in T1 and T2 compute relations between

concepts at nodes

Steps 1 and 2 constitute the preprocessing phase (off-line), and are executed once and each time after the schema/ontology is changed(graphs are converted into lightweight ontologies)

Steps 3 and 4 constitute the matching phase (on-line), and are executed every time the two schemas/ontologies are to be matched (the two lightweight ontologies are matched)

21

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 22: Logics for Data and Knowledge Representation Semantic Matching

Step 1: compute concepts at labels Natural language expressions are translated into a formal

language Concepts are computed by disambiguating among all possible

senses of words in a label and their interrelations

Computation: Tokenization. Labels are parsed into tokens. Lemmatization. Tokens are morphologically analyzed in order to

find their possible basic forms. Building atomic concepts. An oracle (WordNet) is used to

extract senses of lemmatized tokens. Building complex concepts. Prepositions, conjunctions, etc. are

translated into logical connectives and used to build complex concepts out of the atomic concepts.

22

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 23: Logics for Data and Knowledge Representation Semantic Matching

Step 1: compute concepts at labels Tokenization. “Images and text” <Images, and, text>;

Lemmatization. “Images” Image;

Building atomic concepts. “Image” has 8 senses in WordNet, 7 as a noun and 1 as a verb.Some might be filtered out analyzing the context. The rest are kept:“Image” Image#1 ⊔ Image#3 ⊔ Image#7;

Building complex concepts (Image#1 ⊔ Image#3 ⊔ Image#7) ⊔ text#2

23

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 24: Logics for Data and Knowledge Representation Semantic Matching

Step 2: compute concepts at nodes Concepts at labels are extended by taking into account the

context of the node, i.e. the position of the node in the tree

Computation: The concept at a node for some node n is computed as the

conjunction of the concepts at labels along the path from the root till the node n itself

24

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 25: Logics for Data and Knowledge Representation Semantic Matching

Step 2: compute concepts at nodes

C4 = CEurope ⊓ CPictures ⊓ CItaly

4

Italy

Europe

Wine and Cheese

Austria

Pictures

1

2 3

5

25

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 26: Logics for Data and Knowledge Representation Semantic Matching

Step 3: compute relations between concepts at labels

Exploit a priori knowledge, e.g., lexical, domain knowledge

with the help of element level semantic matchers

Computation: Concepts at labels of each pair of nodes from the two trees are

compared to determine semantic relations between them (without caring about their context)

The set of semantic relations computed with this step constitute the set of axioms that will be used to compute the mapping, i.e. our TBox!

26

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 27: Logics for Data and Knowledge Representation Semantic Matching

Step 3: compute relations between concepts at labels

27

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1

Images

T1 T2

T2

≡CItaly

≡CAustria

≡CEurope

≡CImages

CAustriaCItalyCPicturesCEuropeT1

CWine CCheese

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 28: Logics for Data and Knowledge Representation Semantic Matching

Step 4: compute relations between concepts at nodes

28

The matching problem is reduced to a set of node matching problems Each node matching problem is reduced to a validity problem

TBox T: the relations between concepts at labels (from step 3) Given the pair of nodes (n, m), deciding if a given semantic relation

holds between them corresponds to verifying that a certain relation holds between corresponding concepts at node: Subsumption: T C⊨ n ⊑ Cm or T C⊨ n ⊒ Cm

Equivalence: T C⊨ n ⊑ Cm and T C⊨ n ⊒ Cm

Disjointness: T C⊨ n C⊓ m ⊑

This is done by eliminating the TBox T and verifying that the negation of each formula is unsatisfiable.

The strongest relation holding between the two nodes is returned

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 29: Logics for Data and Knowledge Representation Semantic Matching

Step 4: compute relations between concepts at nodes Suppose we want to check if C12 ≡ C22

29

T2

=C14

C13

=C12

C11

C25C24C23C22C21T1

(C1Images C2Pictures) (C1Europe C2Europe) (C12 C22 )

Context (the relevant part of the

TBox)

Goal

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 30: Logics for Data and Knowledge Representation Semantic Matching

The final result

30

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1Images

T1 T2≡

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1Images

T1 T2

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1Images

T1 T2

Italy

Europe

Wine and

Cheese

Austria

Pictures

1

2 3

4 5

Europe

Italy Austria

2

3 4

1Images

T1 T2

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 31: Logics for Data and Knowledge Representation Semantic Matching

A

B E

DJOURNALSjournals#1

C

DEVELOPMENT ANDPROGRAMMING LANGUAGES

(development#1 ⊔ programming#2) ⊓ languages#3 ⊓ journals#1

JAVA(development#1 ⊔ programming#2)

⊓ languages#3 ⊓ journals#1 ⊓ Java#3

PROGRAMMING AND DEVELOPMENTprogramming#2 ⊔ development#1

F

G

LANGUAGESlanguages#3 ⊓(programming#2 ⊔ development#1)

JAVAJava#3 ⊓ languages#3 ⊓(programming#2 ⊔ development#1)

MAGAZINESMagazines#1 Java#3⊓ ⊓ languages#3

⊓ (programming#2 ⊔ development#1)

⊑⊑

It is slow. Can we make the algorithm faster? Too many relations between nodes are returned by S-Match. Are there some relations which are “more important” than others?

Problems with S-Match

31

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 32: Logics for Data and Knowledge Representation Semantic Matching

A

B E

D

C F

A

B E

D

C F

⊒⊒

A

B E

D

C F

⊥⊥

A

B E

D

C F

≡ ≡

(1) (2) (3) (4)

There are some axioms (the dashed arrows) which can be derived from others (the solid ones)

Note that (4) is a combination of (1) and (2) Note that (1) and (2) are still true in case of equivalence

Redundancy patterns

32

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 33: Logics for Data and Knowledge Representation Semantic Matching

A redundant mapping element is a mapping element which can be derived from the others using the patterns.

A redundant mapping is a set containing redundant mapping elements

The minimal mapping is the biggest subset of mapping elements among those without redundant elements The minimal mapping always exists and it is unique Advantages in visualization, validation and maintenance

The mapping of maximum size is the set containing the maximum number of mapping elements it can be obtained from the propagation of the elements in the minimal

set using the intuitions encoded in the patterns.

Minimal and redundant mappings

33

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 34: Logics for Data and Knowledge Representation Semantic Matching

A

B E

DJOURNALSjournals#1

C

DEVELOPMENT ANDPROGRAMMING LANGUAGES

(development#1 ⊔ programming#2) ⊓ languages#3 ⊓ journals#1

JAVA(development#1 ⊔ programming#2)

⊓ languages#3 ⊓ journals#1 ⊓ Java#3

PROGRAMMING AND DEVELOPMENTprogramming#2 ⊔ development#1

F

G

LANGUAGESlanguages#3 ⊓(programming#2 ⊔ development#1)

JAVAJava#3 ⊓ languages#3 ⊓(programming#2 ⊔ development#1)

MAGAZINESMagazines#1 Java#3⊓ ⊓ languages#3

⊓ (programming#2 ⊔ development#1)

⊑⊑

The minimal mapping

34

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 35: Logics for Data and Knowledge Representation Semantic Matching

Computing the minimal mapping M:

function TreeMatch(tree T1, tree T2) {

TreeDisjoint(root(T1),root(T2));

direction := true;

TreeSubsumedBy(root(T1),root(T2));

direction := false;

TreeSubsumedBy(root(T2),root(T1));

TreeEquiv();

};

Computing the set of maximum size:

function Propagate(M)

(3)

(1)

(2)

(4), from (1) and (2)

MinSMatch: the algorithm

35

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH

Page 36: Logics for Data and Knowledge Representation Semantic Matching

S-Match Lab

36

S-MATCH LAB

• Java: JRE or JDK (preferred)• Java IDE: Eclipse, IDEA, NetBeans, …• Text Editor