exploiting multilinguality for creating mappings between thesauri
TRANSCRIPT
![Page 1: Exploiting Multilinguality For Creating Mappings Between Thesauri](https://reader036.vdocuments.net/reader036/viewer/2022081813/55a93d6d1a28abb5758b47aa/html5/thumbnails/1.jpg)
Exploiting Multilinguality For Creating Mappings Between Thesauri
Mauro Dragoni
Fondazione Bruno Kessler (FBK), Shape and Evolve Living Knowledge Unit (SHELL)
https://shell.fbk.eu/index.php/Mauro_Dragoni - [email protected]
SAC 2015, Salamanca, Spain
April, 14th 2015
![Page 2: Exploiting Multilinguality For Creating Mappings Between Thesauri](https://reader036.vdocuments.net/reader036/viewer/2022081813/55a93d6d1a28abb5758b47aa/html5/thumbnails/2.jpg)
Outline
1. Background on Ontology Matching
2. Motivations
3. The Approach
4. Evaluation of the System
![Page 3: Exploiting Multilinguality For Creating Mappings Between Thesauri](https://reader036.vdocuments.net/reader036/viewer/2022081813/55a93d6d1a28abb5758b47aa/html5/thumbnails/3.jpg)
Ontology Matching - 1
Given two thesauri/ontologies/vocabularies find alignments between entities
Formally a “match” may be represented with the following 5-tuple:
‹ id, e1, e2, R, c ›
Extensive literature about matching approaches (early ‘80s)
![Page 4: Exploiting Multilinguality For Creating Mappings Between Thesauri](https://reader036.vdocuments.net/reader036/viewer/2022081813/55a93d6d1a28abb5758b47aa/html5/thumbnails/4.jpg)
Ontology Matching - 2
Multilinguality started to be considered around 15 years ago
EuroWordNet
MultiWordNet
Domain-specific applications
English-Asian alignment
Multi-lingual vs. Cross-lingual
![Page 5: Exploiting Multilinguality For Creating Mappings Between Thesauri](https://reader036.vdocuments.net/reader036/viewer/2022081813/55a93d6d1a28abb5758b47aa/html5/thumbnails/5.jpg)
Motivations
Need: a system, for experts, able to suggest possible matches between concepts
Exploit multilinguality… why?
allows to reduce ambiguity: the probability, for two different concepts, of having the same label across several languages is very low.
term translations have been adapted to the domain: experts in charge of translations put a lot of their cultural heritage in choosing the right terms for each concept.
First step of an ontology matching platform
![Page 6: Exploiting Multilinguality For Creating Mappings Between Thesauri](https://reader036.vdocuments.net/reader036/viewer/2022081813/55a93d6d1a28abb5758b47aa/html5/thumbnails/6.jpg)
The Proposed Approach - 1
Inspired by information retrieval techniques
Built on top of the Lucene search engine
For each element of the thesaurus a structured multilingual representation is built:
An index for each thesaurus is built
[prefLabel] "Food chains"@en
[prefLabel] "Catene alimentari"@it
[altLabel] "Food distributions"@en
[altLabel] "Reti alimentari"@it
label-en: “food chain”
label-en: “food distribution”
label-it: “catena alimentare”
label-it: “rete alimentare”
![Page 7: Exploiting Multilinguality For Creating Mappings Between Thesauri](https://reader036.vdocuments.net/reader036/viewer/2022081813/55a93d6d1a28abb5758b47aa/html5/thumbnails/7.jpg)
The Proposed Approach - 2
How matches are suggested?
source and target thesauri are chosen
for each concept, a query is performed from the source to the target thesaurus
the standard Lucene scoring formula is used for computing the ranking
for each query, a ranking of 5 suggestions is provided to the user
![Page 8: Exploiting Multilinguality For Creating Mappings Between Thesauri](https://reader036.vdocuments.net/reader036/viewer/2022081813/55a93d6d1a28abb5758b47aa/html5/thumbnails/8.jpg)
Evaluation Set-Up
2 contexts:
six multilingual thesauri (3 medical domain, 3 agricultural domain)
adapted Multifarm benchmark
2 tasks:
matching system (only the first suggestion is considered)
suggestion system
![Page 9: Exploiting Multilinguality For Creating Mappings Between Thesauri](https://reader036.vdocuments.net/reader036/viewer/2022081813/55a93d6d1a28abb5758b47aa/html5/thumbnails/9.jpg)
Results - 1
Mapping Set # of Mappings Prec@1 Prec@3 Prec@5 Recall
Eurovoc Agrovoc 1297 0.816 0.931 0.967 0.874
Agrovoc Eurovoc 1297 0.906 0.969 0.988 0.695
Avg. 0.861 0.950 0.978 0.785
Gemet Agrovoc 1181 0.909 0.964 0.983 0.546
Agrovoc Gemet 1181 0.943 0.981 0.994 0.740
Avg. 0.926 0.973 0.989 0.643
MDR MeSH 6061 0.776 0.914 0.956 0.807
MeSH MDR 6061 0.716 0.888 0.939 0.789
Avg. 0.746 0.901 0.948 0.798
MDR SNOMED 19971 0.621 0.826 0.908 0.559
SNOMED MDR 19971 0.556 0.760 0.855 0.519
Avg. 0.589 0.793 0.882 0.539
MeSH SNOMED 26634 0.690 0.871 0.931 0.660
SNOMED MeSH 26634 0.657 0.835 0.908 0.564
Avg. 0.674 0.853 0.920 0.612
Results obtained by the proposed system on the domain-specific thesauri
![Page 10: Exploiting Multilinguality For Creating Mappings Between Thesauri](https://reader036.vdocuments.net/reader036/viewer/2022081813/55a93d6d1a28abb5758b47aa/html5/thumbnails/10.jpg)
Results - 2
Mapping Set IRBOM WeSeE
(2012)
RiMOM
(2013)
YAM++
(2013)
YAM++
(2012)
AUTOM
Sv2
(2012)
Agrovoc Eurovoc 0.821 0.785 0.628 0.615 0.615 0.599
Gemet Agrovoc 0.759 0.726 0.548 0.579 0.579 0.485
MDR MeSH 0.771 0.749 0.611 0.613 0.613 0.536
MDR SNOMED 0.563 0.624 0.495 0.473 0.473 0.405
MeSH SNOMED 0.642 0.631 0.457 0.458 0.458 0.497
Results obtained by the all systems on the domain-specific thesauri
![Page 11: Exploiting Multilinguality For Creating Mappings Between Thesauri](https://reader036.vdocuments.net/reader036/viewer/2022081813/55a93d6d1a28abb5758b47aa/html5/thumbnails/11.jpg)
Results - 3
System Name Precision Recall F-Measure
IRBOM 0.68 0.43 0.53
WeSeE (2012) 0.61 0.32 0.41
RiMOM (2013) 0.52 0.13 0.21
YAM++ (2013) 0.51 0.36 0.40
YAM++ (2012) 0.50 0.36 0.40
AUTOMSv2 (2012) 0.49 0.10 0.36
Results obtained by all systems on the adapted Multifarm Benchmark
![Page 12: Exploiting Multilinguality For Creating Mappings Between Thesauri](https://reader036.vdocuments.net/reader036/viewer/2022081813/55a93d6d1a28abb5758b47aa/html5/thumbnails/12.jpg)
Future Work
Analyzing all kind of relationships between concepts
Using weights associated with relationships
Improve the search mechanism (faceting, fuzzy, …)
Practical aspects: Web-application