Download - Reuse of Ontology Mappings
1
REUSE OF ONTOLOGY MAPPINGS
Anika Groß, Database Group, Universität Leipzig
Canberra, March 2016
2
• Structured representation of knowledge
• Used for annotation as standardized semantic description of object properties
• Very large ontologies in the life sciences
ONTOLOGIES
Anatomy Molecular biology
ChemistryMedicine
Tissue
Anatomic Structure,System, or Substance
Organ
Lung SkinKidney …
…
3
MeSH
GALENSNOMED CT
NCI Thesaurus
Uberon
Mouse Anatomy
FMA
• Overlapping ontologies → creation of mappings/alignments• Useful for data integration, analysis across sources …
• Ontology mapping: set of semantic correspondences (links) between concepts of different ontologies
ONTOLOGY MAPPINGS
4
• Overlapping ontologies → creation of mappings/alignments• Useful for data integration, analysis across sources …
• Ontology mapping: set of semantic correspondences (links) between concepts of different ontologies
ONTOLOGY MAPPINGS
𝑶𝟐
tail
headneck
limbs
limb segments
body
𝑶𝟏
head
lower extremities
limbs
upper extremities
body
neck
trunk
tail
=
===
<<
=
𝑶𝑴𝑶𝟏,𝑶𝟐
• Manual or semi-automatic identification (matching)
5
• Ontologies are not static!
• Research, new knowledge continuous changes
• Release of new versions
• Ontology changes
→ Impact on dependent mappings and applications?
EVOLUTION OF ONTOLOGIES AND MAPPINGS
𝑶𝟏
0
𝑶𝟐
𝑶𝑴𝑶𝟏,𝑶𝟐
6
REUSE EXISTING MAPPINGS TO …
→ create new ontology mappings• “Indirect” matching: combine existing mappings to create
new mappings between so far unconnected sources
→ create up-to-date ontology mappings• Migration of outdated mappings to currently valid
ontology versions
Ontologies, ontology mappings, ontology evolution
2) Composition-based ontology matching3) Adaptation of ontology mappings
4) Outlook
7
ONTOLOGY MATCHING WORKFLOW
• Manual creation of mappings between very large ontologies is too labor-intensive
• Semi-automatic generation of semantic correspondences:linguistic, structural, instance-based matching techniques
Matching
Mappingsim(O1.a, O2.b) = 0.8sim(O1.a, O2.c) = 0.5sim(O1.c, O2.c) = 1.0
further input, e.g. instances, dictionary
…
O1
O2
Pre-processing
Post-processing
8
?
• Indirect composition-based matching
• Via intermediate ontology (IO):important hub ontology,synonym dictionary, …
MAPPING COMPOSITION
MA_0001421 UBERON:0001092 NCI_C32239
Synonym: Atlas Name: atlas
Name: C1 VertebraName: cervical vertebra 1 Synonym: cervical vertebra 1
Synonym: C1 vertebra
• Find new correspondences via composition
• Reuse existing mappings to
• Increase match quality & save computation time
IO
O1 O2
Groß, Hartung, Kirsten, Rahm: Mapping Composition for Matching Large Life Science Ontologies. 2nd International Conference on Biomedical Ontology (ICBO), 2011
9
• Use mappings to intermediate ontologies IO1, …, IOk
to indirectly match O1 and O2
• Reduce matching effort by reusing mappings to IO → very fast composition
INDIRECT MATCHING
...
IO1
IO2
IOk
O1 O2
...
O1
O2
On
HOOnew
→ IO should have a significant overlap with O1 and O2
→ IO1, …, IOk may complement each other
→ Centralized hub HO
→ many mappings to other ontologies
→ Onew aligned with any Oi via HO
10
• (Binary) compose operator• Composes two mappings 𝑀𝑂1,𝐼𝑂 and 𝑀𝐼𝑂,𝑂2 to create
a new mapping 𝑀𝑂1,𝑂2:
COMPOSE OPERATOR
11
O1IO1 O2
occ = 1: CMO1,O2 = {(a,a),(b,b),(c,c)}occ = 2: CMO1,O2 = {(a,a)}
Input: Two ontologies O1 and O2, list of intermediate ontologies IO1… IOk, occurrence count occ
Output: Composed mapping CMO1,O2
COMPOSEMATCH
a
b c
d e
a
b
g h
a
b c
d
f
a
i c
IO2MapList empty
for each IOi IO do
MO1,IOi getMapping(O1, IOi)
return 𝑚𝑒𝑟𝑔𝑒(MapList, occ)
MapList.add(𝑐𝑜𝑚𝑝𝑜𝑠𝑒(MO1,IOi, MIOi,O2))
MIOi,O2 getMapping(IOi, O2)
end for
MapList
(c,c ), (a,a)
(a,a), (b,b)
12
EVALUATION SETUP
• Match problem• Adult Mouse Anatomy (MA)
• NCI Thesaurus Anatomy part (NCIT)
Uberon
UMLSMA NCIT
RadLex
FMA
• Gold standard ~1500 correspondences
• Precompute mappings using a match strategy
~5000
~88,000
~30,800
~81,000
~2,700 ~3,300
#concepts
13
EVALUATION SETUP
• Match problem• Adult Mouse Anatomy (MA)
• NCI Thesaurus Anatomy part (NCIT)
PreprocessingNormalization
Linguistic Matcher(Name, synonyms, Trigram t = 0.8)
Selection & Postprocessing
Uberon
UMLSMA NCIT
RadLex
FMA
• Gold standard ~1500 correspondences
~5000
~88,000
~30,800
~81,000
~2,700 ~3,300
#concepts
14
• Direct match result compared to composeMatch via each IO
• Additional matching of unmatched parts (extendMatch)
RESULTS
88.2%
86%
• Uberon & UMLS → best evaluated intermediate ontologies
Intermediate Ontology IO
15
• Combination of four composed mappings
• Correspondences have to occur in at least 1, …, 4 mappings
RESULTS
union(occ=1)
F-Measure 90.2
Precision 92.7
Recall 87.8
Higher occurrence→ Recall ↓
extendMatch→ Recall ↑
16
• Combination of four composed mappings
• Correspondences have to occur in at least 1, …, 4 mappings
RESULTS
http://oaei.ontologymatching.org/[year]/anatomy
Top Results OAEI
Other systems later adopted similar techniques to make use of domain specific background knowledge (e.g. including Uberon, UMLS)
17
COMPOSITION VIA SEVERAL SOURCES
• Many “mapping path” alternatives…
GeoNames
LinkedGeoData
PubMed
Wrong domain
Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013
• Which intermediate source(s) should be used?
S T
A
B
C
S T
A
B
C
18
COMPOSITION VIA SEVERAL SOURCES
• Many “mapping path” alternatives…
GeoNames
LinkedGeoData
PubMedWorldFactBook
Too special
Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013
• Which intermediate source(s) should be used?
S T
A
B
C
S T
A
B
C
19
COMPOSITION VIA SEVERAL SOURCES
• Many “mapping path” alternatives…
GeoNames
LinkedGeoData
PubMedWorldFactBook
DBpedia
Ok, universal knowledge source
Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013
• Which intermediate source(s) should be used?
S T
A
B
C
S T
A
B
C
20
COMPOSE OPERATOR
21
EFFECTIVENESS OF MAPPINGS FOR COMPOSITION
Source S Target TIntermediate IMS,I MI,T
domain(MS,I) range(MS,I) domain(MI,T) range(MI,T)
Binary:
n-ary:
1. Mapping coverage in S and T should be high
2. Overlap of entities in I should be high
22
Mapping-based
• Take all mapping paths between S and T
• Different path filtering methods1) Effectiveness: k most effective mapping
paths (selEff)
2) Complement: k best complementing mapping paths w.r.t. S and T (selComp)
Link-based
• Select best routes in a graph of links between entities/concepts (not on “mapping level”)
• Graph-based approach• Transformation of S, T and mappings
in M into a weighted, directed graph
• Application of Shortest-Path algorithm to solve mapping composition problem
DIFFERENT COMPOSITION STRATEGIES
Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013
23
Reuse of mappings and composition strategies → very useful to create new correspondences/links
EVALUATION
60
70
80
90
100
NYT-DBp NYT-FB NYT-GeoN MA-NCIT
F-m
easu
re
all selEff selCompl link
• selEff, selComp, link always better than naïve (all) approach
Geography(Instance Matching track)
Anatomytrack
• Selection strategies better for Anatomy
• Link strategy slightly better for Geography
+ Best Compose approach always better than direct match
24
REUSE EXISTING MAPPINGS TO …
→ create new ontology mappings• “Indirect” matching: combine existing mappings to create
new mappings between so far unconnected sources
→ create up-to-date ontology mappings• Migration of outdated mappings to currently valid
ontology versions
Ontologies, ontology mappings, ontology evolution
Composition-based ontology matching
2) Adaptation of ontology mappings3) Outlook
25
𝑶𝟏′
𝑶𝟐′
𝑶𝟏
𝑶𝟐
𝑂𝑀𝑂1,𝑂2 𝑂𝑀𝑂1′,𝑂2′ ?
Requirements• High mapping quality
• Mapping consistency
• Include new concepts
• Reduction of manual effort, involve user feedback
• Support of semantic mappings
• Mappings can become invalid → need to be updated
• Reuse existing mappings (avoid full re-determination)
MAPPING ADAPTATION PROBLEM
Groß: Evolution von ontologiebasierten Mappings in den Lebenswissenschaften, Dissertation, Universität Leipzig, 2014.
Groß, Dos Reis, Hartung, Pruski, Rahm: Semi-automatic adaptation of mappings between life science ontologies. Proc. 9th Intl. Conference on Data Integration in the Life Sciences (DILS), 2013.
26
ADAPTATION APPROACHES
𝑶𝑴𝑶𝟏, 𝑶𝟏′
𝑶𝟏
𝑶𝟐
𝑂𝑀𝑂1,𝑂2
compose
𝒅𝒊𝒇𝒇𝑶𝟏, 𝑶𝟏′𝑶𝟏
𝑶𝟐
DiffAdapt
𝑂𝑀𝑂1,𝑂2
Composition-basedAdaptation (CA)
Diff-basedAdaptation (DA)
𝑶𝟏’𝑶𝟏’
27
ADAPTATION APPROACHES
𝑶𝑴𝑶𝟏, 𝑶𝟏′
𝑶𝑴𝑶𝟐,𝑶𝟐′
𝑶𝟏
𝑶𝟐
𝑂𝑀𝑂1,𝑂2
𝒅𝒊𝒇𝒇𝑶𝟏, 𝑶𝟏′
𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′
𝑶𝟏
𝑶𝟐
𝑂𝑀𝑂1,𝑂2
Composition-basedAdaptation (CA)
Diff-basedAdaptation (DA)
𝑶𝟏’
𝑶𝟐’𝑶𝟐’
𝑶𝟏’
28
ADAPTATION APPROACHES
compose
𝑶𝑴𝑶𝟏, 𝑶𝟏′
𝑶𝑴𝑶𝟐,𝑶𝟐′
𝑶𝟏
𝑶𝟐
𝑂𝑀𝑂1,𝑂2 𝑶𝑴𝑶𝟏′ ,𝑶𝟐′
𝒅𝒊𝒇𝒇𝑶𝟏, 𝑶𝟏′
𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′
𝑶𝟏
𝑶𝟐
𝑶𝑴𝑶𝟏′ ,𝑶𝟐′
DiffAdapt
𝑂𝑀𝑂1,𝑂2
Composition-basedAdaptation (CA)
Diff-basedAdaptation (DA)
𝑶𝟏’
𝑶𝟐’𝑶𝟐’
𝑶𝟏’
29
• COnto-Diff: Diff Evolution Mapping 𝑑𝑖𝑓𝑓(𝑂𝑜𝑙𝑑 , 𝑂𝑛𝑒𝑤)
• Based on match mapping between two ontology versions 𝑂𝑜𝑙𝑑 and 𝑂𝑛𝑒𝑤
• Set of basic and complex change operations
addC, addR, …
delC, delR, toObsolete, …
split, merge, substitute, …
• GENERIC ONTOLOGY MATCHING AND MAPPING MANAGEMENT
• Generic infrastructure to manage and analyze evolution of ontologies and mappings
GOMMA
30
• Combine ‘old‘ ontology mapping with ontology evolution mapping (between old and new version): compose-Operator
• Reuse and adapt existing correspondences
COMPOSITION-BASED ADAPTATION
• Semantic correspondence types?
+ Matching added concepts (𝑂1’\𝑂1, 𝑂2’ \𝑂2)
tail
headneck
limbslower extremities limb segments
limbs
upper extremities
body
neck
body
𝑶𝟏 𝑶𝟐
trunk
limbs
head and neck
body
𝑶𝟐‘
lower limbsupper limbs
==
=
===
=
<<
>
>
<<
tail
head
𝑶𝑴𝑶𝟏,𝑶𝟐 𝑶𝑴𝑶𝟐,𝑶𝟐′
trunk
semType:
= equivalent
< less general
> more general
31
𝑶𝑴𝑶𝟏,𝑶𝟐′
• Combine ‘old‘ ontology mapping with ontology evolution mapping (between old and new version): compose-Operator
• Reuse and adapt existing correspondences
COMPOSITION-BASED ADAPTATION
• Semantic correspondence types?
+ Matching added concepts (𝑂1’\𝑂1, 𝑂2’ \𝑂2)
lower extremitieslimbs
upper extremities
body
neck
𝑶𝟏
trunk
limbs
head and neck
body
𝑶𝟐‘
lower limbsupper limbs
tail
head
trunk
semType:
= equivalent
< less general
> more general
?
32
<<neck head and neckhead
compose
ℎ𝑎𝑛𝑑𝑙𝑒𝑑
headneckneck
head and neckhead
𝑶𝟏 𝑶𝟐 𝑶𝟐‘
COMBINATION OF SEMANTIC CORRESPONDENCES
• Correspondence (𝑐1, 𝑐2), 𝑐1 ∈ 𝑂1, 𝑐2 ∈ 𝑂2
• 𝑠𝑒𝑚𝑇𝑦𝑝𝑒 ∈ =,<,>, ≈
• 𝑠𝑡𝑎𝑡𝑢𝑠 ∈ ℎ𝑎𝑛𝑑𝑙𝑒𝑑, 𝑡𝑜𝑉𝑒𝑟𝑖𝑓𝑦
= < > ≈
= = < > ≈
< < < ≈ ≈
> > ≈ > ≈
≈ ≈ ≈ ≈ ≈
semType1
semType2
==
<<
semType1 semType2
• Semantic type: ≈• Status: 𝑡𝑜𝑉𝑒𝑟𝑖𝑓𝑦• compose → 4 correspondences
lower extremitieslimb segments
upper extremities
lower limbs
upper limbs
>>
<<
𝑶𝟏 𝑶𝟐 𝑶𝟐‘
33
• Modular, flexible adaptation approach
• Individual migration for different change operations using Change Handler 𝐶𝐻
• Reuse and adaptation of existing correspondences
DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS
34
DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS
tail
headneck
limbslower extremities limb segments
limbs
upper extremities
body
neck
body
𝑶𝟏 𝑶𝟐
trunk
limbs
head and neck
body
𝑶𝟐‘
lower limbsupper limbs
trunk
=
>
=
=
===
=
<<
>
<<
tail
head
𝑶𝑴𝑶𝟏,𝑶𝟐 𝑶𝑴𝑶𝟐,𝑶𝟐′
35
DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS
tail
headneck
limbslower extremities limb segments
limbs
upper extremities
body
neck
body
𝑶𝟏 𝑶𝟐
trunk
limbs
head and neck
body
𝑶𝟐‘
lower limbsupper limbs
trunk
=
>
=
=
===
=
<<
>
<<
tail
head
𝑶𝑴𝑶𝟏,𝑶𝟐 𝑶𝑴𝑶𝟐,𝑶𝟐′
merge({head, neck}, head and neck)
addC(trunk)delC(tail)
𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′
split (limb segments, {lower limbs, upper limbs})
36
DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS
DiffAdapt 𝑶𝑴𝑶𝟐,𝑶𝟏, 𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′ , 𝑶𝟐,𝑶𝟐′, 𝑶𝟏, 𝑪𝑯
1. Determination of affected correspondences 𝑶𝑴𝒊𝒏𝒇𝒍 using 𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′
2. Reuse of unaffected mapping part: 𝑂𝑀𝑂2′,𝑂1← 𝑂𝑀𝑂2,𝑂1\ 𝑂𝑀𝑖𝑛𝑓𝑙
3. For each 𝑐ℎ ∈ 𝐶𝐻
• Adaptation of 𝑂𝑀𝑖𝑛𝑓𝑙 using a change hander strategy (𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′, 𝑶𝟐,𝑶𝟐′, 𝑶𝟏)
4. Union of 𝑂𝑀𝑖𝑛𝑓𝑙 with unaffected mapping part:
𝑂𝑀𝑂2′,𝑂1← 𝑂𝑀𝑂2′,𝑂1 ∪ 𝑂𝑀𝑖𝑛𝑓𝑙
tail
headneck
limbslower extremities limb segments
limbs
upper extremities
body
neck
body
𝑶𝟏 𝑶𝟐
trunk
limbs
head and neck
body
𝑶𝟐‘
lower limbsupper limbs
trunk
=
>
=
=
===
=
<<
>
<<
tail
head
𝑶𝑴𝑶𝟏,𝑶𝟐 𝑶𝑴𝑶𝟐,𝑶𝟐′
𝑶𝑴𝒊𝒏𝒇𝒍
Unaffected
37
𝑚𝑒𝑟𝑔𝑒 𝒉𝒆𝒂𝒅, 𝑛𝑒𝑐𝑘 , 𝒉𝒆𝒂𝒅 𝒂𝒏𝒅 𝒏𝒆𝒄𝒌
EXAMPLES
MergeHandler
= <neckneckhead and neck= headhead
𝑶𝟏 𝑶𝟐 𝑶𝟐‘
upper extremities <lower limbsupper limbs
lower extremitieslimb segments
<
𝑠𝑝𝑙𝑖𝑡(𝒍𝒊𝒎𝒃 𝒔𝒆𝒈𝒎𝒆𝒏𝒕𝒔, {𝒍𝒐𝒘𝒆𝒓 𝒍𝒊𝒎𝒃𝒔, 𝒖𝒑𝒑𝒆𝒓 𝒍𝒊𝒎𝒃𝒔})
SplitHandler - “take best”
≈lower extremities lower limbs 𝒕𝒐𝑽𝒆𝒓𝒊𝒇𝒚
upper extremities upper limbs≈ 𝒕𝒐𝑽𝒆𝒓𝒊𝒇𝒚
< head and neckhead 𝒉𝒂𝒏𝒅𝒍𝒆𝒅
<neck 𝒉𝒂𝒏𝒅𝒍𝒆𝒅head and neck
<
>>
𝑚𝑒𝑟𝑔𝑒({ℎ𝑒𝑎𝑑, 𝒏𝒆𝒄𝒌}, 𝒉𝒆𝒂𝒅 𝒂𝒏𝒅 𝒏𝒆𝒄𝒌)
38
• UMLS Mapping versions: „silver standard“
• Adaptation of 2009 version, reference mapping: 2012 version
EVALUATION
Ontology size Mapping size
1
10
100
1.000
10.000
100.000
# c
han
ges
NCIT SCT
FMA NCIT SCT
#Concepts2009 62,285 63,655 310,121
#Concepts2012 62,285 84,132 318,502
SCT-NCIT
#Corr2009 19,971
#Corr2012 22,732
• merge, split, …
• Many concept additions and toObsolete changes
• Mapping changes
• 8% delCorr
• 19% addCorr
Ontology changes
39
70
75
80
85
90
95
100
Unaff CA CA+m DA DA+m
MAPPING QUALITY SCT-NCIT
• Unaffected correspondences only (Unaff ): good results
• CA: Precision ↓
• CA+m: Recall ↑ , F-Measure ≈ 90%
• Diff-based approaches: increased quality, especially Precision ↑
• DA+m: best quality, F-Measure ≈ 94%
RecallUnaff
F-MeasureUnaff
Precision Recall F-Measure
Composition Diff
40
Adaptation Strategy
1) Automatic detection of consistent mappings w.r.t. new ontology version
2) Recommendations for new correspondences→ Aim: complete mapping
3) Expert validation of correspondence (𝑡𝑜𝑉𝑒𝑟𝑖𝑓𝑦 status)
SEMI-AUTOMATIC MAPPING ADAPTATION
High mapping quality Consistent mappingNew correspondences for new concepts Reduction of manual effort Consider mapping semantics
41
• Ontology matching and entity linking• Integration of larger sets of heterogeneous sources:
holistic matching and reuse of clustered entities
• Semantic enrichment with concepts of ontologies
• Interactive tools for link verification
• Mapping semantics• Use of semantic relationships (is-a, part-of, …) in
mappings and Diff
• Evolution and adaptation of ontology-based annotations
OUTLOOK