semantic enrichment of mappings
DESCRIPTION
Semantic Enrichment of Mappings. Patrick Arnold. Outline. 1. Motivation 2. Goals 3. Related Work 4 . Determining the Relation Type 5. Implementation 6. First Results 7. Conclusions. 1. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Semantic Enrichment ofMappings
Patrick Arnold
WDI-Lab, Abteilung für Datenbanken, Universität Leipzig
2
Outline
04/19/2023Abteilung für Datenbanken, Inst. für Informatik, Universität Leipzig
1. Motivation2. Goals3. Related Work4. Determining the Relation Type5. Implementation6. First Results7. Conclusions
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig3
1. Motivation
Classic approaches in schema/ontology matching provide only little information about the correspondences Source node Target node Confidence
Further details are commonly omitted What kind of relation?
equal, is-a, part-of, overlap Simple correspondence vs. complex correspondence?
(first name, last name) ↔ name Transformation functions?
gross price = net price * (1 + sales taxes) name = first name + “ “ + last name
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig4
1. Motivation
Our intentions: Mapping enrichment Enhance a mapping by adding further or more-specific
information to its correspondences Useful for merging and transforming schemas/ontologies
Workflow: Input: A mapping Mapping enrichment carried out in an independent system
(blackbox) Output is an enriched mapping
Implies a new, more-specific format
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig5
1. Motivation
Typical relation types Equal Is-a Part-of Overlap
Inverse types: Equal Inverse is-a Has-a Overlap
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig6
2. Goals
First Focus: Detecting the relation type of a correspondence Investigate linguistic methods on element level Extension by existing strategies possible equal, is-a, inverse is-a
Later… Relation type detection on instance level Exploiting background knowledge Correspondence type, transformation rules, …
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig7
3. Related Work
Several projects dealing with this problem Mainly based on the following methods:
Using dictionaries, thesauri, corpora WordNet, GermaNet Includes tokenization, normalization of strings etc.
Using background knowledge The Open University: Using Swoogle to retrieve multiple
ontologies referring to a concept Exploiting the structure between ontologies Exploiting Reasoning, Bayes Nets, Feature Vectors etc. Search Engines (Google)
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig8
3. Related Work
SMatch Complex strategy using WordNet to determine the following
relations: Equal, more-general, less-general, overlap, mismatch “Overlap” offers few interesting information (concepts are
somehow related…) Approach: To each word in a label, annotate all meanings of
this word found in WordNet Compare/match the meanings of the words Exploit the relations offered by WordNet
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig9
3. Related Work
TaxoMap Focus on geographic ontologies Detect relations equal, is-a, inv is-a and is-close
Focus rather on the correspondence itself, not on the type Is-a relation if a label in node S appears in node T and is a full
word Use WordNet as additional source
Working on manually pre-defined branches of WordNet instead of the entire thesaurus
Useful for domain-specific ontologies Recall: 23 %, Precision: 83 %
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig10
3. Related Work
LogMap Uses reasoning algorithms to repair/discover mappings
Based on Horn logics and Dowling-Gallier-Algorithm Use background knowledge (thesauri)
Detects full correspondences and weak correspondences No specific relation detection per se
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig11
4. Relation Type Determination4.1 Introduction
Typically, there is no link between the syntax and semantics of words stool, chair, seat… refer to the same object stool, school, tool, pool, wool… have nothing in common!
Things change when it comes to compounds… blackbird is a bird high school is a school
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig12
4. Relation Type Determination4.1 Introduction
Compound: Two words A, B of a language form a new word AB apple + tree → apple tree sun + glasses → sunglasses forth + with → forthwith
A, B can be noun, verb, adjective/adverb, preposition We are normally interested in nouns
19.04.202313
4. Relation Type Determination4.1 Introduction
WDI-Lab, Abteilung für Datenbanken, Universität Leipzig
No compounds are... Compositions AB where A (or B) is not an official word
broom, nausea Derivations
discard, unload, increase, compound Compositions AB where A and B are not semantically
related door (do + or), wither (wit + her)
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig14
4. Relation Type Determination4.1 Introduction
Unlike non-compounds, semantics can be generally derived from the compound’s syntax Especially in nouns
blackboard is a board handbag is a bag
Germanic languages are left-branching Germanic: school bus, central intelligence agency Romanic: rio de las palmas (= palm river)
In English, no changes are applied to the words: German: Ort + Eingang → Ortseingang, Stadt + Bau → Städtebau English: city + limit → city limit, city + planning → city planning
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig15
4. Relation Type Determination4.2 Classification
From an Linguistic point of views…
* C A, C B, AB ~⊈ ⊈ R B
Description Example Relation (AB : B)
Endocentric A+B denote something more specific than B
lecture hallblackboard
AB B⊂
Exocentric A+B denote something more specific of an unexpressed term C
doughnutbuttercup
AB C *⊂
Copulative / appositional
A+B denote something that is the sum of what A and B denote
bittersweetBosnia-Herzeg.actor-director
AB ⊂(A B)∪
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig16
4. Relation Type Determination4.2 Classification
From the English point of view… Closed form
database, playground, blackbird Hyphened form
bus-driver, single-minded, small-appliance industry Open form
web space, container ship, computer scientist
From a POS point of view… noun-noun, adjective-noun, verb-verb, …
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig17
4. Relation Type Determination4.3 First Conclusions
From the knowledge now gained, we can enrich correspondences in schemas in two ways: Set the relation type to is-a instead of equal (1) Remove or at least doubt an existing correspondence (2)
For (1) we assume that AB B⊂ (cookbook, book, 0.8, equal) → (cookbook, book, 0.8, is-a)
For (2) we assume that If A is not a word in AB, the correspondence is likely to be false: (stool, tool, 0.9, equal) → false? (refund, fund, 0.7, equal) → false? (discharge, charge, 0.7, equal) → false?
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig18
4. Relation Type Determination4.4 Mismatches
A word changed its spelling over the centuries: butterfly (“flutter-by”, “beat fly”, …) Weiße Elster (from Czech: alstra = water)
A compound is of literal meaning (metaphor): Completely different meaning
computer mouse, gravy train, buttercup Obvious origin (in a broad sense being related):
airport, birdhouse, downtown, snowman
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig19
4. Relation Type Determination4.4 Mismatches
Inaccuracies in (vernacular) language e.g., in biology: strawberry, blackberry, raspberry etc.
Neither is a berry in the biological sense (yet tomato, banana, grape, pumpkin, melon etc. are)
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig20
4. Relation Type Determination4.4 Mismatches
For detecting the relation type, the mismatch problem has no negative effect on the mapping The correspondence is wrong after all
(buttercup, cup, equal) is as wrong as (buttercup, cup, is-a) Enrichment has no negative effect on the mapping per se
Still, enhanced methods can be used to reduce the mismatches
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig21
5. Implementation5.1 Goals
Specify the following relation types on linguistic methods: equal (default), is-a, inverse is-a Missing: part-of and overlap English and German language
Main focus on English language
Possibly apply mapping repair Remove correspondences that seem clearly wrong
Test & Evaluation
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig22
5. Implementation5.1 Goals
First concentrate on the element level Use linguistic knowledge as presented before Different cases to be distinguished
Single items vs. itemizations
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig23
5. Implementation5.2 Cases
Simple Case (1:1) Source and target node consist of one item
blackboard ↔ board high school ↔ school international database conference ↔ conference
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig24
5. Implementation5.2 Cases
Complex Cases (1:n, n:1, n:m) Source/target node consist of several item
blackboard, whiteboard ↔ board wine ↔ white wine, red wine beer, wine ↔ wine computers, laptops ↔ computers
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig25
5. Implementation5.3 Node Level vs. Path level
Relation type depends on the perspective… Node level vs. Path level Relation is often…
is-a on node level equal on path level
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig26
5. Implementation5.3 Node Level vs. Path level
Source Target
+ Apparel + Children - Shoes - Caps - …
+ Apparel + Children Shoes + Caps + …
Source Target
+ Kids + Apparel + Shoes + Caps + …
+ Clothing + Children Shoes + Caps + …
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig27
5. Implementation5.4 Requirements
Benchmarks / Gold Standards (English language) Manually defined
Dictionary / Thesauri
More-specific data structure Correspondence: source node, target node, confidence, type Node: A list of items Item: A list of word Word: single word vs. compound
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig28
5. Implementation5.5 Generating Benchmarks
Benchmarks More difficult than in standard mappings In some cases even for humans difficult to decide
Birdhouse is a house? Airport is a port?
How to judge correspondences in an evaluation? car = bike → FALSE car = auto → TRUE motorbike bike → ?⊂
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig29
5. Implementation5.6 Challenges
Exocentric compounds Airport, buttercup, saw tooth, …
Compounds in itemizations (French wine, German wine — French wine) inverse is-a (French wine, German wine — European wine) is-a (French wine, German wine — Mosel wine) overlap (French wine, German wine — Italian wine) mismatch
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig30
5. Implementation5.6 Challenges
Plurals (Christian churches — church) (red wine, white wine — wines)
Short forms Infant colic — colic (equal instead of is-a)
Node Level vs. Path Level Compound extending/skipping levels in the schema
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig31
5. Implementation5.6 Challenges
Limited recall Strong dependency to input (mapping) Some is-a relations cannot be detected with simple
linguistic methods (car, vehicle) (wine, beverage) (cell phones, communication devices)
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig32
6. First Results
Web ↔ Yahoo 421 Correspondences 68 subset-correspondences
Found 50 subset-relations, with 34 being correct Recall: 50.0 % Precision: 68.0 % f-Measure: 59.0 %
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig33
6. First Results
Google Health ↔ Yahoo Health (excerpt) 396 Correspondences 31 subset-correspondences
Found 20 subset-relations, with 15 being correct Recall: 48.3 % Precision: 75.0 % f-Measure: 61.6 %
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig34
6. First Results
Main issues observed… Imprecise labels
infant colic — colic (equal) Uterine-Fibroids — Uterus.Fibroids (equal) picture frames — frames (equal in field “arts”)
Node-Path-Discrepancies “No-Compound”-Subsets
vehicle — car (isa)
19.04.2023WDI-Lab, Abteilung für Datenbanken, Universität Leipzig35
7. Conclusions
Mapping Enrichment Relation type Simple vs. complex correspondences
Transformation rules
Relation Type Determination Linguistic approach on element level
Compounds, itemizations Advanced methods
Instance level, background knowledge etc. Increase recall, keep up precision