reuse of ontology mappings

41
1 REUSE OF ONTOLOGY MAPPINGS Anika Groß, Database Group, Universität Leipzig Canberra, March 2016

Upload: anika-gross

Post on 16-Jan-2017

114 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Reuse of Ontology Mappings

1

REUSE OF ONTOLOGY MAPPINGS

Anika Groß, Database Group, Universität Leipzig

Canberra, March 2016

Page 2: Reuse of Ontology Mappings

2

• Structured representation of knowledge

• Used for annotation as standardized semantic description of object properties

• Very large ontologies in the life sciences

ONTOLOGIES

Anatomy Molecular biology

ChemistryMedicine

Tissue

Anatomic Structure,System, or Substance

Organ

Lung SkinKidney …

Page 3: Reuse of Ontology Mappings

3

MeSH

GALENSNOMED CT

NCI Thesaurus

Uberon

Mouse Anatomy

FMA

• Overlapping ontologies → creation of mappings/alignments• Useful for data integration, analysis across sources …

• Ontology mapping: set of semantic correspondences (links) between concepts of different ontologies

ONTOLOGY MAPPINGS

Page 4: Reuse of Ontology Mappings

4

• Overlapping ontologies → creation of mappings/alignments• Useful for data integration, analysis across sources …

• Ontology mapping: set of semantic correspondences (links) between concepts of different ontologies

ONTOLOGY MAPPINGS

𝑶𝟐

tail

headneck

limbs

limb segments

body

𝑶𝟏

head

lower extremities

limbs

upper extremities

body

neck

trunk

tail

=

===

<<

=

𝑶𝑴𝑶𝟏,𝑶𝟐

• Manual or semi-automatic identification (matching)

Page 5: Reuse of Ontology Mappings

5

• Ontologies are not static!

• Research, new knowledge continuous changes

• Release of new versions

• Ontology changes

→ Impact on dependent mappings and applications?

EVOLUTION OF ONTOLOGIES AND MAPPINGS

𝑶𝟏

0

𝑶𝟐

𝑶𝑴𝑶𝟏,𝑶𝟐

Page 6: Reuse of Ontology Mappings

6

REUSE EXISTING MAPPINGS TO …

→ create new ontology mappings• “Indirect” matching: combine existing mappings to create

new mappings between so far unconnected sources

→ create up-to-date ontology mappings• Migration of outdated mappings to currently valid

ontology versions

Ontologies, ontology mappings, ontology evolution

2) Composition-based ontology matching3) Adaptation of ontology mappings

4) Outlook

Page 7: Reuse of Ontology Mappings

7

ONTOLOGY MATCHING WORKFLOW

• Manual creation of mappings between very large ontologies is too labor-intensive

• Semi-automatic generation of semantic correspondences:linguistic, structural, instance-based matching techniques

Matching

Mappingsim(O1.a, O2.b) = 0.8sim(O1.a, O2.c) = 0.5sim(O1.c, O2.c) = 1.0

further input, e.g. instances, dictionary

O1

O2

Pre-processing

Post-processing

Page 8: Reuse of Ontology Mappings

8

?

• Indirect composition-based matching

• Via intermediate ontology (IO):important hub ontology,synonym dictionary, …

MAPPING COMPOSITION

MA_0001421 UBERON:0001092 NCI_C32239

Synonym: Atlas Name: atlas

Name: C1 VertebraName: cervical vertebra 1 Synonym: cervical vertebra 1

Synonym: C1 vertebra

• Find new correspondences via composition

• Reuse existing mappings to

• Increase match quality & save computation time

IO

O1 O2

Groß, Hartung, Kirsten, Rahm: Mapping Composition for Matching Large Life Science Ontologies. 2nd International Conference on Biomedical Ontology (ICBO), 2011

Page 9: Reuse of Ontology Mappings

9

• Use mappings to intermediate ontologies IO1, …, IOk

to indirectly match O1 and O2

• Reduce matching effort by reusing mappings to IO → very fast composition

INDIRECT MATCHING

...

IO1

IO2

IOk

O1 O2

...

O1

O2

On

HOOnew

→ IO should have a significant overlap with O1 and O2

→ IO1, …, IOk may complement each other

→ Centralized hub HO

→ many mappings to other ontologies

→ Onew aligned with any Oi via HO

Page 10: Reuse of Ontology Mappings

10

• (Binary) compose operator• Composes two mappings 𝑀𝑂1,𝐼𝑂 and 𝑀𝐼𝑂,𝑂2 to create

a new mapping 𝑀𝑂1,𝑂2:

COMPOSE OPERATOR

Page 11: Reuse of Ontology Mappings

11

O1IO1 O2

occ = 1: CMO1,O2 = {(a,a),(b,b),(c,c)}occ = 2: CMO1,O2 = {(a,a)}

Input: Two ontologies O1 and O2, list of intermediate ontologies IO1… IOk, occurrence count occ

Output: Composed mapping CMO1,O2

COMPOSEMATCH

a

b c

d e

a

b

g h

a

b c

d

f

a

i c

IO2MapList empty

for each IOi IO do

MO1,IOi getMapping(O1, IOi)

return 𝑚𝑒𝑟𝑔𝑒(MapList, occ)

MapList.add(𝑐𝑜𝑚𝑝𝑜𝑠𝑒(MO1,IOi, MIOi,O2))

MIOi,O2 getMapping(IOi, O2)

end for

MapList

(c,c ), (a,a)

(a,a), (b,b)

Page 12: Reuse of Ontology Mappings

12

EVALUATION SETUP

• Match problem• Adult Mouse Anatomy (MA)

• NCI Thesaurus Anatomy part (NCIT)

Uberon

UMLSMA NCIT

RadLex

FMA

• Gold standard ~1500 correspondences

• Precompute mappings using a match strategy

~5000

~88,000

~30,800

~81,000

~2,700 ~3,300

#concepts

Page 13: Reuse of Ontology Mappings

13

EVALUATION SETUP

• Match problem• Adult Mouse Anatomy (MA)

• NCI Thesaurus Anatomy part (NCIT)

PreprocessingNormalization

Linguistic Matcher(Name, synonyms, Trigram t = 0.8)

Selection & Postprocessing

Uberon

UMLSMA NCIT

RadLex

FMA

• Gold standard ~1500 correspondences

~5000

~88,000

~30,800

~81,000

~2,700 ~3,300

#concepts

Page 14: Reuse of Ontology Mappings

14

• Direct match result compared to composeMatch via each IO

• Additional matching of unmatched parts (extendMatch)

RESULTS

88.2%

86%

• Uberon & UMLS → best evaluated intermediate ontologies

Intermediate Ontology IO

Page 15: Reuse of Ontology Mappings

15

• Combination of four composed mappings

• Correspondences have to occur in at least 1, …, 4 mappings

RESULTS

union(occ=1)

F-Measure 90.2

Precision 92.7

Recall 87.8

Higher occurrence→ Recall ↓

extendMatch→ Recall ↑

Page 16: Reuse of Ontology Mappings

16

• Combination of four composed mappings

• Correspondences have to occur in at least 1, …, 4 mappings

RESULTS

http://oaei.ontologymatching.org/[year]/anatomy

Top Results OAEI

Other systems later adopted similar techniques to make use of domain specific background knowledge (e.g. including Uberon, UMLS)

Page 17: Reuse of Ontology Mappings

17

COMPOSITION VIA SEVERAL SOURCES

• Many “mapping path” alternatives…

GeoNames

LinkedGeoData

PubMed

Wrong domain

Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013

• Which intermediate source(s) should be used?

S T

A

B

C

S T

A

B

C

Page 18: Reuse of Ontology Mappings

18

COMPOSITION VIA SEVERAL SOURCES

• Many “mapping path” alternatives…

GeoNames

LinkedGeoData

PubMedWorldFactBook

Too special

Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013

• Which intermediate source(s) should be used?

S T

A

B

C

S T

A

B

C

Page 19: Reuse of Ontology Mappings

19

COMPOSITION VIA SEVERAL SOURCES

• Many “mapping path” alternatives…

GeoNames

LinkedGeoData

PubMedWorldFactBook

DBpedia

Ok, universal knowledge source

Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013

• Which intermediate source(s) should be used?

S T

A

B

C

S T

A

B

C

Page 20: Reuse of Ontology Mappings

20

COMPOSE OPERATOR

Page 21: Reuse of Ontology Mappings

21

EFFECTIVENESS OF MAPPINGS FOR COMPOSITION

Source S Target TIntermediate IMS,I MI,T

domain(MS,I) range(MS,I) domain(MI,T) range(MI,T)

Binary:

n-ary:

1. Mapping coverage in S and T should be high

2. Overlap of entities in I should be high

Page 22: Reuse of Ontology Mappings

22

Mapping-based

• Take all mapping paths between S and T

• Different path filtering methods1) Effectiveness: k most effective mapping

paths (selEff)

2) Complement: k best complementing mapping paths w.r.t. S and T (selComp)

Link-based

• Select best routes in a graph of links between entities/concepts (not on “mapping level”)

• Graph-based approach• Transformation of S, T and mappings

in M into a weighted, directed graph

• Application of Shortest-Path algorithm to solve mapping composition problem

DIFFERENT COMPOSITION STRATEGIES

Hartung, Groß, Rahm: Composition Methods for Link Discovery. Proc. of 15. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2013

Page 23: Reuse of Ontology Mappings

23

Reuse of mappings and composition strategies → very useful to create new correspondences/links

EVALUATION

60

70

80

90

100

NYT-DBp NYT-FB NYT-GeoN MA-NCIT

F-m

easu

re

all selEff selCompl link

• selEff, selComp, link always better than naïve (all) approach

Geography(Instance Matching track)

Anatomytrack

• Selection strategies better for Anatomy

• Link strategy slightly better for Geography

+ Best Compose approach always better than direct match

Page 24: Reuse of Ontology Mappings

24

REUSE EXISTING MAPPINGS TO …

→ create new ontology mappings• “Indirect” matching: combine existing mappings to create

new mappings between so far unconnected sources

→ create up-to-date ontology mappings• Migration of outdated mappings to currently valid

ontology versions

Ontologies, ontology mappings, ontology evolution

Composition-based ontology matching

2) Adaptation of ontology mappings3) Outlook

Page 25: Reuse of Ontology Mappings

25

𝑶𝟏′

𝑶𝟐′

𝑶𝟏

𝑶𝟐

𝑂𝑀𝑂1,𝑂2 𝑂𝑀𝑂1′,𝑂2′ ?

Requirements• High mapping quality

• Mapping consistency

• Include new concepts

• Reduction of manual effort, involve user feedback

• Support of semantic mappings

• Mappings can become invalid → need to be updated

• Reuse existing mappings (avoid full re-determination)

MAPPING ADAPTATION PROBLEM

Groß: Evolution von ontologiebasierten Mappings in den Lebenswissenschaften, Dissertation, Universität Leipzig, 2014.

Groß, Dos Reis, Hartung, Pruski, Rahm: Semi-automatic adaptation of mappings between life science ontologies. Proc. 9th Intl. Conference on Data Integration in the Life Sciences (DILS), 2013.

Page 26: Reuse of Ontology Mappings

26

ADAPTATION APPROACHES

𝑶𝑴𝑶𝟏, 𝑶𝟏′

𝑶𝟏

𝑶𝟐

𝑂𝑀𝑂1,𝑂2

compose

𝒅𝒊𝒇𝒇𝑶𝟏, 𝑶𝟏′𝑶𝟏

𝑶𝟐

DiffAdapt

𝑂𝑀𝑂1,𝑂2

Composition-basedAdaptation (CA)

Diff-basedAdaptation (DA)

𝑶𝟏’𝑶𝟏’

Page 27: Reuse of Ontology Mappings

27

ADAPTATION APPROACHES

𝑶𝑴𝑶𝟏, 𝑶𝟏′

𝑶𝑴𝑶𝟐,𝑶𝟐′

𝑶𝟏

𝑶𝟐

𝑂𝑀𝑂1,𝑂2

𝒅𝒊𝒇𝒇𝑶𝟏, 𝑶𝟏′

𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′

𝑶𝟏

𝑶𝟐

𝑂𝑀𝑂1,𝑂2

Composition-basedAdaptation (CA)

Diff-basedAdaptation (DA)

𝑶𝟏’

𝑶𝟐’𝑶𝟐’

𝑶𝟏’

Page 28: Reuse of Ontology Mappings

28

ADAPTATION APPROACHES

compose

𝑶𝑴𝑶𝟏, 𝑶𝟏′

𝑶𝑴𝑶𝟐,𝑶𝟐′

𝑶𝟏

𝑶𝟐

𝑂𝑀𝑂1,𝑂2 𝑶𝑴𝑶𝟏′ ,𝑶𝟐′

𝒅𝒊𝒇𝒇𝑶𝟏, 𝑶𝟏′

𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′

𝑶𝟏

𝑶𝟐

𝑶𝑴𝑶𝟏′ ,𝑶𝟐′

DiffAdapt

𝑂𝑀𝑂1,𝑂2

Composition-basedAdaptation (CA)

Diff-basedAdaptation (DA)

𝑶𝟏’

𝑶𝟐’𝑶𝟐’

𝑶𝟏’

Page 29: Reuse of Ontology Mappings

29

• COnto-Diff: Diff Evolution Mapping 𝑑𝑖𝑓𝑓(𝑂𝑜𝑙𝑑 , 𝑂𝑛𝑒𝑤)

• Based on match mapping between two ontology versions 𝑂𝑜𝑙𝑑 and 𝑂𝑛𝑒𝑤

• Set of basic and complex change operations

addC, addR, …

delC, delR, toObsolete, …

split, merge, substitute, …

• GENERIC ONTOLOGY MATCHING AND MAPPING MANAGEMENT

• Generic infrastructure to manage and analyze evolution of ontologies and mappings

GOMMA

Page 30: Reuse of Ontology Mappings

30

• Combine ‘old‘ ontology mapping with ontology evolution mapping (between old and new version): compose-Operator

• Reuse and adapt existing correspondences

COMPOSITION-BASED ADAPTATION

• Semantic correspondence types?

+ Matching added concepts (𝑂1’\𝑂1, 𝑂2’ \𝑂2)

tail

headneck

limbslower extremities limb segments

limbs

upper extremities

body

neck

body

𝑶𝟏 𝑶𝟐

trunk

limbs

head and neck

body

𝑶𝟐‘

lower limbsupper limbs

==

=

===

=

<<

>

>

<<

tail

head

𝑶𝑴𝑶𝟏,𝑶𝟐 𝑶𝑴𝑶𝟐,𝑶𝟐′

trunk

semType:

= equivalent

< less general

> more general

Page 31: Reuse of Ontology Mappings

31

𝑶𝑴𝑶𝟏,𝑶𝟐′

• Combine ‘old‘ ontology mapping with ontology evolution mapping (between old and new version): compose-Operator

• Reuse and adapt existing correspondences

COMPOSITION-BASED ADAPTATION

• Semantic correspondence types?

+ Matching added concepts (𝑂1’\𝑂1, 𝑂2’ \𝑂2)

lower extremitieslimbs

upper extremities

body

neck

𝑶𝟏

trunk

limbs

head and neck

body

𝑶𝟐‘

lower limbsupper limbs

tail

head

trunk

semType:

= equivalent

< less general

> more general

?

Page 32: Reuse of Ontology Mappings

32

<<neck head and neckhead

compose

ℎ𝑎𝑛𝑑𝑙𝑒𝑑

headneckneck

head and neckhead

𝑶𝟏 𝑶𝟐 𝑶𝟐‘

COMBINATION OF SEMANTIC CORRESPONDENCES

• Correspondence (𝑐1, 𝑐2), 𝑐1 ∈ 𝑂1, 𝑐2 ∈ 𝑂2

• 𝑠𝑒𝑚𝑇𝑦𝑝𝑒 ∈ =,<,>, ≈

• 𝑠𝑡𝑎𝑡𝑢𝑠 ∈ ℎ𝑎𝑛𝑑𝑙𝑒𝑑, 𝑡𝑜𝑉𝑒𝑟𝑖𝑓𝑦

= < > ≈

= = < > ≈

< < < ≈ ≈

> > ≈ > ≈

≈ ≈ ≈ ≈ ≈

semType1

semType2

==

<<

semType1 semType2

• Semantic type: ≈• Status: 𝑡𝑜𝑉𝑒𝑟𝑖𝑓𝑦• compose → 4 correspondences

lower extremitieslimb segments

upper extremities

lower limbs

upper limbs

>>

<<

𝑶𝟏 𝑶𝟐 𝑶𝟐‘

Page 33: Reuse of Ontology Mappings

33

• Modular, flexible adaptation approach

• Individual migration for different change operations using Change Handler 𝐶𝐻

• Reuse and adaptation of existing correspondences

DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS

Page 34: Reuse of Ontology Mappings

34

DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS

tail

headneck

limbslower extremities limb segments

limbs

upper extremities

body

neck

body

𝑶𝟏 𝑶𝟐

trunk

limbs

head and neck

body

𝑶𝟐‘

lower limbsupper limbs

trunk

=

>

=

=

===

=

<<

>

<<

tail

head

𝑶𝑴𝑶𝟏,𝑶𝟐 𝑶𝑴𝑶𝟐,𝑶𝟐′

Page 35: Reuse of Ontology Mappings

35

DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS

tail

headneck

limbslower extremities limb segments

limbs

upper extremities

body

neck

body

𝑶𝟏 𝑶𝟐

trunk

limbs

head and neck

body

𝑶𝟐‘

lower limbsupper limbs

trunk

=

>

=

=

===

=

<<

>

<<

tail

head

𝑶𝑴𝑶𝟏,𝑶𝟐 𝑶𝑴𝑶𝟐,𝑶𝟐′

merge({head, neck}, head and neck)

addC(trunk)delC(tail)

𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′

split (limb segments, {lower limbs, upper limbs})

Page 36: Reuse of Ontology Mappings

36

DIFF-BASED ADAPTATION OF ONTOLOGY MAPPINGS

DiffAdapt 𝑶𝑴𝑶𝟐,𝑶𝟏, 𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′ , 𝑶𝟐,𝑶𝟐′, 𝑶𝟏, 𝑪𝑯

1. Determination of affected correspondences 𝑶𝑴𝒊𝒏𝒇𝒍 using 𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′

2. Reuse of unaffected mapping part: 𝑂𝑀𝑂2′,𝑂1← 𝑂𝑀𝑂2,𝑂1\ 𝑂𝑀𝑖𝑛𝑓𝑙

3. For each 𝑐ℎ ∈ 𝐶𝐻

• Adaptation of 𝑂𝑀𝑖𝑛𝑓𝑙 using a change hander strategy (𝒅𝒊𝒇𝒇𝑶𝟐,𝑶𝟐′, 𝑶𝟐,𝑶𝟐′, 𝑶𝟏)

4. Union of 𝑂𝑀𝑖𝑛𝑓𝑙 with unaffected mapping part:

𝑂𝑀𝑂2′,𝑂1← 𝑂𝑀𝑂2′,𝑂1 ∪ 𝑂𝑀𝑖𝑛𝑓𝑙

tail

headneck

limbslower extremities limb segments

limbs

upper extremities

body

neck

body

𝑶𝟏 𝑶𝟐

trunk

limbs

head and neck

body

𝑶𝟐‘

lower limbsupper limbs

trunk

=

>

=

=

===

=

<<

>

<<

tail

head

𝑶𝑴𝑶𝟏,𝑶𝟐 𝑶𝑴𝑶𝟐,𝑶𝟐′

𝑶𝑴𝒊𝒏𝒇𝒍

Unaffected

Page 37: Reuse of Ontology Mappings

37

𝑚𝑒𝑟𝑔𝑒 𝒉𝒆𝒂𝒅, 𝑛𝑒𝑐𝑘 , 𝒉𝒆𝒂𝒅 𝒂𝒏𝒅 𝒏𝒆𝒄𝒌

EXAMPLES

MergeHandler

= <neckneckhead and neck= headhead

𝑶𝟏 𝑶𝟐 𝑶𝟐‘

upper extremities <lower limbsupper limbs

lower extremitieslimb segments

<

𝑠𝑝𝑙𝑖𝑡(𝒍𝒊𝒎𝒃 𝒔𝒆𝒈𝒎𝒆𝒏𝒕𝒔, {𝒍𝒐𝒘𝒆𝒓 𝒍𝒊𝒎𝒃𝒔, 𝒖𝒑𝒑𝒆𝒓 𝒍𝒊𝒎𝒃𝒔})

SplitHandler - “take best”

≈lower extremities lower limbs 𝒕𝒐𝑽𝒆𝒓𝒊𝒇𝒚

upper extremities upper limbs≈ 𝒕𝒐𝑽𝒆𝒓𝒊𝒇𝒚

< head and neckhead 𝒉𝒂𝒏𝒅𝒍𝒆𝒅

<neck 𝒉𝒂𝒏𝒅𝒍𝒆𝒅head and neck

<

>>

𝑚𝑒𝑟𝑔𝑒({ℎ𝑒𝑎𝑑, 𝒏𝒆𝒄𝒌}, 𝒉𝒆𝒂𝒅 𝒂𝒏𝒅 𝒏𝒆𝒄𝒌)

Page 38: Reuse of Ontology Mappings

38

• UMLS Mapping versions: „silver standard“

• Adaptation of 2009 version, reference mapping: 2012 version

EVALUATION

Ontology size Mapping size

1

10

100

1.000

10.000

100.000

# c

han

ges

NCIT SCT

FMA NCIT SCT

#Concepts2009 62,285 63,655 310,121

#Concepts2012 62,285 84,132 318,502

SCT-NCIT

#Corr2009 19,971

#Corr2012 22,732

• merge, split, …

• Many concept additions and toObsolete changes

• Mapping changes

• 8% delCorr

• 19% addCorr

Ontology changes

Page 39: Reuse of Ontology Mappings

39

70

75

80

85

90

95

100

Unaff CA CA+m DA DA+m

MAPPING QUALITY SCT-NCIT

• Unaffected correspondences only (Unaff ): good results

• CA: Precision ↓

• CA+m: Recall ↑ , F-Measure ≈ 90%

• Diff-based approaches: increased quality, especially Precision ↑

• DA+m: best quality, F-Measure ≈ 94%

RecallUnaff

F-MeasureUnaff

Precision Recall F-Measure

Composition Diff

Page 40: Reuse of Ontology Mappings

40

Adaptation Strategy

1) Automatic detection of consistent mappings w.r.t. new ontology version

2) Recommendations for new correspondences→ Aim: complete mapping

3) Expert validation of correspondence (𝑡𝑜𝑉𝑒𝑟𝑖𝑓𝑦 status)

SEMI-AUTOMATIC MAPPING ADAPTATION

High mapping quality Consistent mappingNew correspondences for new concepts Reduction of manual effort Consider mapping semantics

Page 41: Reuse of Ontology Mappings

41

• Ontology matching and entity linking• Integration of larger sets of heterogeneous sources:

holistic matching and reuse of clustered entities

• Semantic enrichment with concepts of ontologies

• Interactive tools for link verification

• Mapping semantics• Use of semantic relationships (is-a, part-of, …) in

mappings and Diff

• Evolution and adaptation of ontology-based annotations

OUTLOOK