0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · workshop des gi-arbeitskreises ontologien in...

43
"#$%& ’($)%*(+ (" &*, -#.’($)#/- -$(0+ 1(/&(2(-#,/ #/ 3#(4,5#6#/ 0/5 2,3,/%’#%%,/%7*8"&,/9 :(342; <=.<> /(?,43,$ <@@A 2,#+6#-

Upload: vantu

Post on 17-Sep-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

!

!

!

!

"#$%&!'($)%*(+!("!&*,!

-#.'($)#/-!-$(0+!

!

1(/&(2(-#,/!

#/!3#(4,5#6#/!0/5!

2,3,/%'#%%,/%7*8"&,/9!

:(342;!

!

<=.<>!/(?,43,$!<@@A!

2,#+6#-!

!

!

!

!

!

!

!

Page 2: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML)

[Workshop of the Working Group "Ontologies in Biomedicine and the Life Sciences"]

WEDNESDAY, 25th November

13h00 Welcome remarks

Session 1 Metrics for ontology evaluation (Chair: Janet Kelso)

13h20-13h40 Hartung Evolution of Life Science Ontologies

13h40-14h00 Gross Quality of Functional Annotations in Life Science Data Sources

14h00-14h20 Kirsten Matching large Life Science Ontologies

14h20-14h40 Brochhausen Applying Corpus-Based Ontology Evaluation – The Case of the ACGT Master Ontology

14h40-15h00 Auer Linked Data for the Life Sciences

15h00-15h20 Dietzold Collaborative Editing and Publishing of Linked Data with OntoWiki

15h20-15h40 COFFEE

Session 2 Terminologies for clinical diagnostic support (Chair: Stefan Schulz)

15h40-16h00 Niggemann Snomed CT, Description Logic and Decision Support

16h00-16h20 Hellmann Einsatz der Terminologien SNOMED CT und ID MACS am Beispiel der elektronischen

Organspendeerklärung (eOSE)

16h20-16h40 Robinson Clinical diagnostics in human genetics with semantic similarity searches in ontologies

16h40-17h00 Straub Dynamic Typing and Non-monotonic Reasoning - Principles for a Semantic Interpreter

Session 3.1 Theoretical principles (Chair: Heinrich Herre)

17h00-1720 Baader How should parthood relations be expressed in SNOMED CT?

17h20-17h40 Jansen Constitution relations in biological cells

19h00 DINNER at the Restaurant "Madrid" Klostergasse 3-5. http://www.cafe-madrid.de/

Page 3: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML)

[Workshop of the Working Group "Ontologies in Biomedicine and the Life Sciences"]

THURDAY, 26th November

Session 3.2 Theoretical principles

09h00-09h20 Straub Reality and Abstraction: A Look at Different Models

09h20-09h40 Loebe Ontological semantics

Session 4 Ontology engineering and use (Chair: Robert Hoehndorf)

09h40-10h00 Freitas Neglected tropical diseases - A challenge to biomedical ontology engineering

10h00-10h20 Ngonga The application of an ontology design pattern for functional abnormalities to phenotype ontologies

and the extraction of an ontology of anatomical functions

10h20-10h40 Hastings The ChEBI ontology

10h40-11h00 Waechter An ontology-generation plug-in for OBO-Edit

11h00-11h20 COFFEE

11h20-11h40 Schober Concurrent ontology building with Collaborative Protégé

11h40-12h10 Plake Mining ontology concepts from literature for automated gene annotation.

12h10 LUNCH

Session 5 Modelling and Causality (Chair: Frank Loebe)

13h10-13h30 Neumuth A Four-Level Translational Approach to Modeling Surgical Processes

13h30-13h50 Mudunuri

13h50-14h10 Hege Knowledge Representation via Digital Brain Atlases

14h10-14h30 Knuepfer Beyond Structure: KiSAO and TEDDY – Two Ontologies Addressing Pragmatical and Dynamical

Aspects of Computational Models in Systems Biology

14h30-14h50 Michalek A theory of causality

15h00-16h00 Open discussion: OBML Working Group

Page 4: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Participant List: Workshop des Arbeitskreises OBML

Sören Auer (Universität Leipzig)(*)

Franz Baader (TU Dresden)(*)

Annette Bals (Fresenius-Netcare)

Felix Balzer (Charité Universitätsmedizin Berlin)

Mathias Brochhausen (IFOMIS, Saarbrücken)(*)

Martin Boeker (Universität Freiburg)

Heiko Dietze (TU Dresden)

Sebastian Dietzold (Universität Leipzig)(*)

Fred Freitas (UFPE - Brazil)(*)

Dayana Goldstein (ICCAS, Universität Leipzig)

Niels Grewe (Universität Rostock)

Anika Groß (Universität Leipzig)(*)

Janna Hastings (European Bioinformatics Institute)(*)

Michael Hartung (Universität Leipzig)(*)

Hans-Christian Hege (Zuse-Institut Berlin (ZIB))(*)

G. Hellman (HellmannConsult)(*)

Heinrich Herre (Universität Leipzig, IMISE)

Robert Hoehndorf (MPI EVA, IMISE, Universität Leipzig) (*)

Ludger Jansen (Universität Rostock)(*)

Janet Kelso (MPI EvA)

Toralf Kirsten (IZBI und IMISE, Universität Leipzig)(*)

Axel Klarmann (zwonull media)

Christian Knüpfer (Universität Jena)(*)

Anja Kuß (Konrad Zuse Institut, Berlin)

Frank Loebe (Universität Leipzig)

Page 5: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Hendrik Mehlhorn (IPK Gatersleben)

Christian Meißner (Universität Leipzig, ICCAS)

Hannes Michalek (Onto-Med, IMISE)(*)

Raj Mudunuri (Universität Leipzig, ICCAS)(*)

Bardo Nelgen (SemaWorx)

Thomas Neumuth (Universität Leipzig, ICCAS)(*)

Axel Ngonga (Universität Leipzig)

Jörg Niggemann ( CompuGroup Software GmbH)(*)

Conrad Plake (BioTec TU-Dresden)(*)

Djamila Raufie (Universität Freiburg)

Peter Robinson (Institut für Medizinische Genetik, Universitätsklinikum Charite, Berlin)(*)

Andread Schierwagen (Universität Leipzig)

Daniel Schober (Universität Freiburg)(*)

Stefan Schulz (Universität Freiburg)(*)

Holger Stenzhorn (Universitätsklinikum Saarland)

Hans Rudolf Straub (Semfinder AG)(*)

Stephan Vollrath (Universität Leipzig, MPI)

Thomas Wächter (BioTec TU-Dresden)(*)

Rainer Winnenburg (Biotec TU-Dresden)

Nico Wüstneck (Universität Leipzig)

Page 6: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Matching large Life Science Ontologies

Toralf Kirsten

Interdisciplinary Centre for Bioinformatics, University of Leipzig Institute for Medical Informatics, Statistics, and Epidemiology, University of Leipzig

[email protected]

Ontologies become increasingly important in life science application domains. Many of them have been recently developed and are frequently used to semantically describe specific prop-erties of biological objects. For instance, molecular-biological objects, such as genes and pro-teins, are described (annotated) with information on the functions and processes they are in-volved in whereas a disease ontology can be utilized to describe the finding of a patient's check-up. Ontologies provide controlled vocabularies for a uniform naming of concepts (and thus the description of object properties) to help to reduce variations in terminology. A very popular ontology is the Gene Ontology (GO) consisting of three (sub-) ontologies on molecu-lar functions, biological processes and cellular components [GOC08]. While ontology con-cepts are increasingly associated with objects (collected in so called annotation mappings) there are only few connections between life science ontologies themselves reflecting their semantic relation.

Ontology matching addresses the problem of finding semantic relations between on-tologies. Each ontology relation, also called ontology mapping or alignment, subsumes a set of correspondences showing which of the ontology concepts are semantically related. While in other domains ontology matching focus primarily on finding semantically equivalent con-cepts, e.g., to overcome the heterogeneity of two product catalogues in e-business, there are also ontology alignments with domain-specific semantics in life sciences. For instance, a mo-lecular function "is involved in" a biological process or "acts in" a cellular component (all can be concepts of the equally named GO sub-ontologies) as described in [MTML06]. Due to the huge and rapidly increasing number of life science ontologies and their large amount of con-cepts it is nearly impossible and very time consuming to create these semantic relations manu-ally. Hence, many approaches have been proposed in recent years which automatically gener-ate candidate sets of concept correspondences that can then be validated by human experts.

The talk first gives an overview of match approaches that have been used in life sci-ences as well as current developments. Secondly, we introduce our research prototype GOMMA, the Generic Ontology Matching and Mapping Analysis system. GOMMA includes both, a highly scalable and space-efficient approach to manage many versions of different ontologies [KHGR09] and a comprehensive set of matchers and similarity functions that can be utilized to align versions of these ontologies. Taking these matchers and similarity func-tions into account, GOMMA allows not only to create ontology mappings but also to refine initially generated alignments by combining these matchers in a workflow-like manner. At the end we give an outline of next steps and open research topics. References: [GOC08] Gene Ontology Consortium: The Gene Ontology project in 2008. Nucleic Acids Re-

search (36 Database):D440-D444, 2008. [MTML06] S. Myre, H. Tveit, T. Mollestad, A. Laegreid: Additional Gene Ontology Structure for

impoved biological reasoning. Bioinformatics, 22(16): 2020-2027, 2006. [KHGR09] T. Kirsten, M. Hartung, A. Groß, E. Rahm: Efficient Management of Biomedical On-

tology Versions. Proc. of Intl. Workshop on Ontology Content, 2009.

Page 7: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Evolution of Life Science Ontologies

Michael Hartung

Interdisciplinary Centre for Bioinformatics, University of Leipzig

Ontologies have become increasingly important in life sciences. They consist of a set of concepts denoted by terms describing and structuring a domain of interest. Concepts are interconnected by different relationship types such as is_a and part_of relationships. Typical kinds of ontology application are the annotation of molecular-biological objects, data exchange in heterogeneous environments as well as the usage in analysis algorithms. For instance, the well-known Gene Ontology (GO) [1] is utilized for the consistent annotation of proteins, in particular the molecular functions and the biological processes in which they are involved, or the cellular components where they act. Using a common ontology for annotation ensures collaborative work on complex topics and allows for the exchange of results between different research groups and organizations.

Life sciences ontologies are not static, i.e., they are modified if new/revised knowledge becomes available or initial design errors need to be corrected. As a result of this continuous evolution ontology providers usually release new versions of their ontology whenever a revised version has been finished. Hence, an ontology version is valid as long as no newer version is provided. For instance, due to its high dynamic GO releases versions everyday. Other ontologies such as the NCI Thesaurus [4] or OBO ontologies [5] are released less frequently, e.g., on a monthly or half-year basis. The release of a new ontology version comes along with numerous problems, in particular for applicants that use the ontology in their analysis routines or for annotation purposes. For instance, the deletion of a concept in the newer version may cause out-dated annotations or may lead to changed analysis results. However, manual adaptation of data to a newer ontology version is error-prone and time-consuming and should be solved in a semi-automatic manner. Hence, it is interesting to know how intensive an ontology has been modified and what changes occurred during its evolution, especially when evolution information is not available to the users. Furthermore, it is important to know if an ontology is currently under high revision (i.e., is unstable) or receives only marginal refinements (i.e., is nearly stable).

The presentation is two-folded. The first part introduces a framework for analyzing the evolution of ontologies [2]. Particularly, an ontology model and measures for quantifying ontology evolution are presented. Selected evaluation results depict the evolution of 16 life science ontologies between 2004 and 2008 including GO, NCI Thesaurus and several OBO ontologies. The second part focuses on OnEX (Ontology Evolution Explorer) [3] a system which is based on the introduced framework. It allows for a web-based access to evolution information of ontologies. Particularly, users can inspect the evolution of whole ontologies as well as detailed information about changes on ontology concepts (e.g., attribute modifications). Furthermore, the tool supports the migration of out-dated annotations to newer versions of an ontology. An outlook discusses next steps and open research topics.

[1] Gene Ontology Consortium: The Gene Ontology project in 2008. Nucleic Acids Research (36

Database):D440-D444, 2008 [2] Hartung, M.; Kirsten, T.; Rahm, E.: Analyzing the Evolution of Life Science Ontologies and Mappings.

In Proceedings of Intl. Workshop on Data Integration in the Life Sciences (DILS), 2008 [3] Hartung, M.; Kirsten, T.; Gross, A.; Rahm, E.: OnEX – Exploring changes in life science ontologies.

BMC Bioinformatics 10:250, 2009 [4] Sioutos, N.; de Coronado, S.; Haber, M.W.: NCI Thesaurus: A semantic model integrating cancer-

related clinical and molecular information. Journal of Biomedical Informatics, 40:30-43, 2007 [5] Smith, B. et al.: The OBO Foundry: coordinated evolution of ontologies to support biomedical data

integration. Nature Biotechnology, 25(11):1251-1255, 2007

Page 8: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Applying Corpus-Based Ontology Evaluation – The Case of

the ACGT Master Ontology

Mathias BROCHHAUSEN a, Gintar! GRIGONYT"b a Institute for Formal Ontology and Medical Information Science, Saarland University, Germany

b Department for Applied Linguistics, Translation and Interpreting, Saarland University,

Germany

Introduction

Brewster et al. [1] proposed techniques to evaluate ontologies by comparing them with

terminology as used in domain specific natural language texts. This approach seems to

be very appealing in the field of medical ontologies, since huge collections of abstracts

are freely available from the internet. In this study we show results utilizing corpus-

based term extraction as reference to evaluate the ACGT Master Ontology (MO)

against. Our aim is to explore opportunities and restrictions of the cooperation between

NLP-created thesauri and reality-oriented ontology development.

Material

The ACGT project (Advancing Clinico-Genomic Trials on Cancer, FP6-IST-026996)

aims to address two key obstacles for bridging the gap between molecular research and

clinical practice:

! the flood of multilevel datasets (from the molecular to the organ to the

individual level),

! the lack of a common infrastructure for clinical research institutions and the

creators of molecular data.

As a result of this situation, very few cross-site studies and multi-centric clinical trials

are performed, and in most cases it is not possible to seamlessly integrate multi-level

data. ACGT aims to overcome these obstacles by setting up a semantic grid

infrastructure in support of multi-centric, post-genomic clinical trials. Semantic

integration in ACGT is done based on the ACGT MO as global schema in a Local-As-

View (LAV) strategy [2].

The ACGT Master Ontology (ACGT MO) is implemented in OWL-DL, the

description-logics based subtype of the Web Ontology Language (OWL) [3] and can be

freely downloaded from http://www.ifomis.org/acgt. The initial development/beta

version of the ACGT MO was published on June 2007, and it has been expanded on the

go by integrating needs of users, both clinical and technical. The developers are

working towards version 1.0. At the moment the ontology contains 1667 classes, 288

object properties, 15 data properties and 61 individuals.

Methods

Evaluation of ontologies is becoming a key issue in ontology-driven computing, but the

development of common standards on how to evaluate ontologies seems to be rather

slow. It is widely accepted that there is a central distinction traditionally drawn

between two different evaluation strategies namely “glass box” or “component”

evaluation and “black box” or “task based” evaluation. This distinction does apply to

evaluation processes regarding ontologies and ontology-driven systems as well [4, 5].

The two strategies must be seen as complementary each providing testing for different

Page 9: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

kinds of qualities. For the present study we focus on the sub-task of evaluation of

domain coverage which is part of glass box evaluation.

In order to acquire a list of terms that are actively used in the domain and are

specific for it, we have applied the methodology for terminology extraction described

in [6]. It encompasses main terminology extraction stages: morphological analysis,

shallow parsing, rule based detection of noun phrases (NPs), term candidate extraction,

termhood assessment and building a term list hierarchy.

We have used the extracted thesaurus for establishing mappings with the classes of

the ACGT MO ontology.

With regard to the ACGT project we started with collecting 3334 domain specific

abstracts of scientific publications. In general, the domain of the ACGT MO is cancer

research and management, due to the focus of the project the ontology is concentrated

on three types of cancer: mammary carcinoma, nephroblastoma (Wilms’ tumour) and

rhabdoid tumour. The corpus of 3334 abstracts consists of tree registers – 1500

abstracts from Pubmed [http://www.ncbi.nlm.nih.gov/pubmed/] concerned with

mammary carcinoma and nephroblastoma, and 334 for rhabdoid tumour.

Results

The results we gained using domain thesauri to validate domain coverage of an

ontology mark this as a highly promising strategy. Especially checking the ontologies

for class names or labels that are actually used by domain experts is a highly important

step since it fosters accessibility and usability of the ontology to the domain experts.

The latter is a crucial aspect in the development and maintenance of clinical ontologies.

The domain experts are the only group that can effectively guide the maintenance in a

way that secures future usefulness to the actual clinical situation with its specific points

of view and restrictions. The results we achieved for the coverage of the ACGT MO

clearly hint to the fact that in order to optimize the accuracy of the testing, the corpora

should be enhanced with texts from patient documentation, since this is a key issue in

the applications of the ACGT MO. First steps in this direction have been taken, but the

results are not yet available.

References

[1] Brewster C, Alani H, Dasmahapatra S, Wilks Y. Data-driven Ontology Evaluation. Proceedings

of the 4th International Conference on Language Resources and Evaluation (LREC 2004),

Lisbon.

[2] Tsiknakis M, Brochhausen M, Nabrzyski J, Pucaski J, Potamias G, Desmedt C, et al. A

semantic grid infrastructure enabling integrated access and analysis of multilevel biomedical data

in support of post-genomic clinical trials on Cancer. IEEE Transactions on Information

Technology in Biomedicine, Special issue on Bio-Grids. 2008; 12 (2):205-217.

[3] OWL Web Ontology Language: Semantics and Abstract Syntax. Available from

http://www.w3.org/TR/owl-semantics/; last visited: 10-28-2009.

[4] Hartmann J, Spyns P, Giboin A, Maynard D, Cuel R, Suárez-Figueroa MC, et al Methods for

ontology evaluation Knowledge Web Deliverable D1.2.3, 2004.

[5] Gangemi A, Catenacci C, Ciaramita M, Lehmann J. Modelling Ontology Evaluation. In

Proceedings of the Third European Semantic Web Conference Berlin, Springer, 2006, pp. 140-

154.

[6] Avizienis A, Grigonyte G, Haller J, von Henke F, Liebig T, Noppens O. Organizing Knowledge

as an Ontology of the Domain of Resilient Computing by Means of Natural Language Processing

- An Experience Report. Artificial Intelligence Research Society Conference, 2009, Florida.

Page 10: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Quality of Functional Annotations in Life Science Data Sources

Anika Groß

Department of Computer Science, University of Leipzig Interdisciplinary Centre for Bioinformatics, University of Leipzig

Ontologies and their application have become increasingly important especially in the life sciences. Typically, they associate objects, such as genes and proteins, with well-defined ontology concepts to semantically and uniformly describe the properties of these biological objects. The association between an object and a concept of an ontology is often denoted as (functional) annotation. The set of all associations between a biological data source and an ontology forms a so-called annotation mapping. For instance, the genes and proteins of Ensembl [1] and Swiss-Prot [2] are associated with concepts of the popular Gene Ontology (GO) [3] to specify the molecular functions and biological processes in which the proteins are involved.

These GO annotations are utilized in analysis scenarios and applications such as functional profiling of large datasets (e.g., [4]), or instance-based ontology matching [5]. The computed results of such applications significantly depend on a good quality of the underlying annotations. One important quality aspect is the stability of annotations since major changes in annotation mappings may substantially influence or even invalidate earlier findings. This is a major issue since annotation mappings change frequently due to new research findings which result in modifications of the underlying ontologies, objects and annotation associations. Moreover, the quality of an annotation is influenced by its creation method, i.e., the method that has been used to generate the annotation (experimentally approved, based on author statements, generated by automatic algorithms). It affects how biologically founded or reliable an annotation is. The relevance of the creation method is underlined by the increasing use of so-called evidence codes (EC) to classify functional annotations based on the GO. Users may utilize EC to focus on specific annotations sets in their analysis/applications, e.g., only manually curated or automatically generated annotations.

The presentation will focus on functional annotations with respect to their quality and possible influences on “annotation-dependent” applications. The talk gives insights in the varying provenance of annotations due to different annotation creation methods. Moreover, the results of a quantitative evaluation of annotation evolution emphasize the need of assessing annotation stability [6]. The presentation highlights how our findings and the proposed assessment method for annotations can be valuable for users and applications of life science annotations. Future algorithms may utilize information of annotation history and quality to derive more reliable results as we initially investigated in a first approach to produce more robust ontology mappings [7].

[1] Hubbard, T.J.; Aken, B.L.; Ayling, S.; et al.: Ensembl 2009. Nucleic Acids Research 37, D690–D697

(Database issue), 2009 [2] Boutet, E.; Lieberherr, D.; Tognolli, M.: UniProtKB/Swiss-Prot. Methods in Molecular Biology 406,

89–112, 2007 [3] Gene Ontology Consortium: The Gene Ontology project in 2008. Nucleic Acids Research (36

Database):D440-D444, 2008 [4] Prüfer, K.; Muetzel, B.; Do, H. et al.: FUNC: a package for detecting significant associations between

gene sets and ontological annotations. BMC Bioinformatics 8(1), 41, 2007 [5] Kirsten, T., Thor, A., Rahm, E.: Instance-based matching of large life science ontologies. In: Cohen-

Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, pp. 172–187. Springer, Heidelberg, 2007

[6] Groß, A.; Hartung, M.; Kirsten; T.; Rahm, E.: Estimating the Quality of Ontology-based Annotations by Considering Evolutionary Changes. Proc. of 6th Int. Workshop on Data Integration in the Life Sciences (DILS), Springer LNCS 5647, 2009

[7] Thor, A.; Hartung, M.; Groß, A.; Kirsten, T.; Rahm, E.: An Evolution-based Approach for Assessing Ontology Mappings - A Case Study in the Life Sciences. Proc. of 13. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2009

Page 11: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Linked Data for the Life Sciences

Sören Auer

Institute for Computer Science,

University of Leipzig

Research Group Agile Knowledge Engineering and Semantic Web (AKSW)

Over the past 3 years, the semantic web activity has gained momentum with the

widespread publishing of structured data as RDF and in particular Linked Data. The

central idea of Linked Data is to extend the Web with a data commons by creating typed

links between data from different data sources [1,2]. Technically, the term Linked Data

refers to a set of best practices for publishing and connecting structured data on the Web

in a way that data is machine-readable, its meaning is explicitly defined, it is linked to

other external data sets, and can in turn be linked to from external data sets. The data

links that connect data sources take the form of RDF triples, where the subject of the

triple is a URI reference in the namespace of one data set, while the object is a URI

reference in the other [3].

The most visible example of adoption and application of Linked Data has been the

Linking Open Data (LOD) project [4], a grassroots community effort to bootstrap the

Web of Data by interlinking open-license data sets. Out of the more than 8 billion RDF

triples that are served as of July 2009 by participants of the project, approximately 148

million are RDF links between data sets.

A central interlinking hub of the emerging Data Web is DBpedia [5], which aims to

extract structured information from Wikipedia and to make this information available on

the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link

other data sets on the Web to Wikipedia data. It already contains rich semantic

descriptions about more that 2 million concepts. One of the probably best represented

domains on the Data Web are the life sciences, with a large number of life-science data

sets, such as DrugBank, Linked Open Drug Data, Linked Clinical Trials, Gene Ontology

and many, many more. In this talk, we will present the concepts and techniques of

Linked Data as well as exhibit their application perspectives for the Life Sciences.

[1] Berners-Lee, T.: Linked Data - Design Issues.

http://www.w3.org/DesignIssues/LinkedData.html

[2] C. Bizer, T. Heath, T. Berners-Lee: Linked Data - The Story So Far. In: International

Journal on Semantic Web & Information Systems, Vol. 5, Issue 3, Pages 1-22, 2009.

[3] Bizer, C., Cyganiak, R., Heath, T.: How to publish Linked Data on the Web.

http://www4.wiwiss.fuberlin.de/bizer/pub/LinkedDataTutorial

[4] http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

[5] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, S. Hellmann:

PDF DocumentDBpedia – A Crystallization Point for the Web of Data. Journal of Web

Semantics (JWS).

!

Page 12: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Collaborative Editing and Publishing of Linked Data with OntoWiki

Sebastian Dietzold

In this talk, we will introduce the Semantic Data Wiki OntoWiki [1] and the underlying Semantic

Web Application Framework [2]. We will show its capabilities in collaboratively creating,

maintaining and working with semantic data in a single OntoWiki instance and in addition to that,

how to work together in a network of wiki systems.

In particular, we will describe and show the following features:

- Working with Linked Data

- Using of Semantic Search Engines

- Tagging

- Facet-based browsing

- Using of arbitrary hierarchies and navigation structures

In addition to that, we will describe the underlying architecture and the extension capabilities.

[1] http://ontowiki.net

[2] Heino, N.; Dietzold, S.; Martin, M. & Auer: Developing Semantic Web Applications with the

OntoWiki Framework. In: S. Pellegrini, T.; Auer, S.; Tochtermann, K. & Schaffert, S. (ed.) -

Networked Knowledge - Networked Media. Springer, 2009, 221, 61-77

--

Sebastian Dietzold - Department of Computer Science; University of Leipzig

Tel/Fax: +49 341 97 323-66/-29 http://bis.uni-leipzig.de/SebastianDietzold

Page 13: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Snomed CT, Description Logic and Decision SupportJÖRG NIGGEMANN, COMPUGROUP SOFTWARE GMBH, MARTINSRIED

Abstract

Snomed is based on Description Logic (DL). It was a huge effort to do that, and in order to undertake such an effort a company would have defined a concrete and measurable goal, would have estimated costs and would have explored whether the goal could be reached by the proposed means and whether it could not be reached by cheaper means. There is no publication of such reasoning with respect to DL in Snomed CT, neither from the American Association of Pathologists nor from the IHTSDO. The common understanding is that

a) logic is needed in order to automatically prove consistency of the corpus of definitions

b) restriction to a subset of First Order Logic (FOL), namely Description Logic, and there again fur-ther restriction to a quite limited dialect of DL, is necessary to make that consistency proof computa-tionally tractable.

Currently the IHTSDO and its scientific contributors deliberate which new dialect of DL should be used in the future – again with a huge effort of migrating from its current form. This is the time to really pose the above questions and ask whether that migration is reasonable. In the scientific com-munity around IHTSDO the use of DL seems to be set in stone, but from the industry there are voices that the restriction to DL is counterproductive and that Full OWL should be used. One of these voices is mine.

In this contribution, I will argument that

1. Numerous current errors in Snomed CT show that computational consistency has nothing to do with medical correctness

2. The use of logic did not only not prevent those errors, but has generated new ones (e.g. the "Amputation Problem": Amputation of toe is-a amputation of foot)

3. Some of that is not the fault of the logic itself but of its false use – but that is even worse be-cause it is harder to repair and prevent

4. Because of the above, more or different logic will never make Snomed CT better than it is now.

5. If you want a good model, place it in the hand of good modelers. You can never replace them by restrictive formalisms.

The use of Description Logic also makes Snomed CT harder to use for decision support (DS). Clinical DS is used where things are not yet completely known. So, if a patient is dismissed with a successfully treated "intercostal neuralgia" – fine that we can code that in Snomed CT. However, he comes with "Chest pain". That can be coded and will trigger DS to suggest heart diagnostics. If we then know that the heart is OK, we have "chest pain of non-cardiac origin". That contains a negation, which is forbid-den in currently used Ontylog DL dialect and maybe will also be forbidden in future versions. For doc-umenting the process of clinical finding of a diagnosis and for triggering the right DS rules however, we have to be able to document our current knowledge about what the patient has not.

At this point comes a second restriction that is propagated by the proponents of DL. For the future they want to exclude such statements as "suspicion of" or "exclusion of" from Snomed CT. They say:

1/2

Page 14: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Snomed CT is for what is in the patient. A suspicion is in the head of the physician, so Snomed should not deal with that (this is called a "realist" approach).

I have a to reply to that:

1. As soon as an entry in the Electronic Health Record goes beyond the recording of direct meas-urements, we have no access to the "reality" of the patient. Statements such as "high blood pressure" and, even more, diagnoses like "high blood pressure disease of renal origin" are the result of a deliberate cognitive act of a physician.

2. In the daily "physicians notes" in the EHR, the physician explicitly wants to write down his thoughts about the patient. Any suspicion, but also risk assessment and prognosis are of that kind. If Snomed CT is not meant for that, so what?

3. Decision Support is explicitly meant to help the physician make decisions – a mental act. Therefore the input for DS Systems must be recordings of the current state of the physician's reasoning. A Snomed CT that would exclude everything that is in the physician's head would be unusable for Decision Support.

Summary:The use of Description Logic apparently does not make Snomed better. Those who propose to make a huge effort of migrating to a new DL dialect should be charged to prove that it is worth the costs.

To the contrary, the use of DL makes Snomed CT harder or even impossible to use for Decision sup-port, especially if it is coupled with a "realist" ideology.

There are "cognitive" and "conceptual" approaches to Ontology, which make "things in the heads of physicians" well accessible for well-founded and logical reasoning.

Therefore: Free Snomed CT from the restrictions of Description Logic and"realist" ideology!

2/2

Page 15: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Einsatz der Terminologien SNOMED CT und ID MACS am Beispiel der elektronischen

Organspendeerklärung (eOSE)

Gunther Hellmann1, Kai Heitmann2, Frank Oemig3 1 HellmannConsult, Erlangen 2 HL7-Benutzergruppe in Deutschland e.V., Köln 3 Agfa Health Care AG, Bonn Einleitung

Vermehrt wird von der Deutschen Stiftung Organtransplantation über eine Abnahme zur Bereitschaft zur Organspende berichtet. Das Bundesgesundheitsministerium hat reagiert und erwirkt, dass die Organspendeerklärung im Notfalldatensatz der Gesundheitskarte (eGK) integriert werden soll. HL7 Deutschland hat eine Projektgruppe initiiert, die sich mit der Analyse des bisherigen Organspendeausweises auseinandersetzt, um eine methodische Vorgehensweise für zukünftige angelehnte Themen zu entwickeln, Input für die zugehörige (eGK-)Anwendung zu liefern und die Standardisierung mittels HL7 zu fördern. Material und Methoden

Als Arbeitsgrundlage wurden das Transplantationsgesetz [1] und der heutige Organspendeausweis, wie er über die Bundeszentrale für gesundheitliche Aufklärung (BZgA) bereitgestellt wird, verwendet. Der Papierausweis liegt in deutscher und türkischer Sprache vor. Beilagen zu dem Ausweis gibt es in zehn weiteren europäischen Sprachen (z.B. bulgarisch) mit Übersetzungen der Organbegriffe. Die Organbegriffe wurden für SNOMED CT [4] übersetzt und identifiziert und für ID MACS [3] mittel Terminologieeserver bestimmt. Als Arbeitsmethodik wurde TOGAF [2] gewählt. Ergebnisse

In mehreren Schritten wurden acht Anwendungsszenarien identifiziert und beschrieben, die Informationsobjekte der Papierform begrifflich festgehalten und daraus die notwendigen elektronischen Objekte hinsichtlich Syntax, Semantik und Vorbelegung abgeleitet. Des Weiteren sind die Akteure und Rollen identifiziert und speziell die Organbegriffe abgebildet worden. Durch die Aufarbeitung wurden mehrere, in entsprechenden Gremien noch zu diskutierende Fragen identifiziert, z.B. die Sicherstellung der gesetzlichen Anforderungen durch verarbeitenden Systeme in Krankenhaus, die Einschränkung von Zugriffsberechtigungen durch „auskunftsberechtigte Ärzte“, Zuordnungsprobleme bei der Begriffsübersetzung oder terminologische Unschärfen bei z.B. „Teile der Hirnhaut“. Die Nutzung der Terminologien offenbarte noch weitere 5 Problemkategorien. Diskussion

Die Ergebnisse sind in der Arbeitsversion 08 [5] festgehalten und erfordern eine tiefer gehende Erörterung speziell der identifizieren Probleme, die partiell grundsätzlicher Natur sind, d.h. diese müssen auch für andere Anwendungsszenarien geklärt werden. Eine Abbildung nach HL7 V3 und CDA wird exemplarisch erfolgen.

Page 16: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Literaturstellen

[1] Transplantationsgesetz, September 2007.

[2] The Open Group: The Open Group Architekture Framework (TOGAF).

http://www.opengroup.org/architecture/togaf8/downloads.htm#Non-Member, Version 8.1.1,

Enterprise Edition, 2007.

[3] ID Gesellschaft für Information und Dokumentation im Gesundheitswesen GmbH & Co.

KGaA: ID MACS®

– Medical Semantic Network. Berlin, 2009.

[4] College of American Pathologists (CAP): SNOMED CT (Systematized Nomenclature of

Medicine-Clinical Terms).

http://www.cap.org/apps/cap.portal?_nfpb=true&_pageLabel=snomed_page, 2009.

[5] HL7-Benutzergruppe in Deutschland e.V.: Implementierungsleitfaden „elektronische

Organspendeerklärung“ (eOSE). Version 08, Feb. 2009.

Einreichung angedacht zum

1. Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswisenschaften

(OBML), Leipzig

(https://wiki.imise.uni-leipzig.de/Gruppen/OBML/Workshops/2009)

Schlüsselworte: Organspende, HL7, Transplantation, DSO, BZgA, Semantik,

Terminologie, Interoperabilität, eGK, Standard, mehrsprachig

Version 1.0, Entwurf

Page 17: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Köhler S

1,2, Schulz MH

3, Krawitz P

1,2, Bauer S

1, Dölken S

1,2, Ott CE

1, Mundlos C, Horn D

1, Mundlos S

1,2,3,

Robinson PN1,2,3

1) Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany. 2) Berlin-Brandenburg Center for Regenerative Therapies (BCRT) 3) Max Planck Institute for Molecular Genetics, The differential diagnostic process attempts to identify candidate diseases that best explain a set of clinical features. This process can be complicated by the fact that the features can have varying degrees of specificity, as well as by the presence of features unrelated to the disease itself. Depending on the experience of the physician and the availability of laboratory tests, clinical abnormalities may be described in greater or lesser detail. We have adapted semantic similarity metrics to measure phenotypic similarity between queries and hereditary diseases annotated with the use of the Human Phenotype Ontology (HPO) and have developed a statistical model to assign P-values to the resulting similarity scores, which can be used to rank the candidate diseases. We show that our approach outperforms simpler term-matching approaches that do not take the semantic interrelationships between terms into account. The advantage of our approach was greater for queries containing phenotypic noise or imprecise clinical descriptions. The semantic network defined by the HPO can be used to refine the differential diagnosis by suggesting clinical features that, if present, best differentiate among the candidate diagnoses. Thus, semantic similarity searches in ontologies represent a useful way of harnessing the semantic structure of human phenotypic abnormalities to help with the differential diagnosis. We have implemented our methods in a freely available web application for the field of human Mendelian disorders. We will also discuss the exact computation of score distributions for similarity searches in ontologies that can be used for the above described clinical application. We introduce a simple null hypothesis which can be used to compute a P-value for the statistical significance of similarity scores. We concentrate on measures based on Resnik’s definition of ontological similarity. A new algorithm is proposed that collapses subgraphs of the ontology graph and thereby allows fast score distribution computation. The new algorithm is several orders of magnitude faster than the naive approach, as we demonstrate by computing score distributions for similarity searches in the Human Phenotype Ontology. The HPO is freely available at http://www.human-phenotype-ontology.org and the Phenomizer is available at http://compbio.charite.de/Phenomizer.

Page 18: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

!"#$%&'(#)*(+,-&.#/&%0)1(+(2003(#&(4%55".")&(60*"$-

!"#"$%&'()*+$%,-./01,'$23+$4',)56/07,0+$%89:,/5

;9,0$:,$',<',=,0&$>0?:6,17,+$9?:$1?$&9,$,6,-,0&=$?.$?)'$',<',=,0&(&/?0$',6(&,$&?$

,6,-,0&=$?.$',(6/&@A$2',+$,"7"+$&9,$',6(&/?0=$*,&:,,0$&9,$',<',=,0&/07$,6,-,0&=$,B)(6$&?$&9,$',6(&/?0=$*,&:,,0$&9,$',<',=,0&,1$,6,-,0&=A$C/..,',0&$',<',=,0&(&/?0=$1?$(0=:,'$

=)89$,==,0&/(6$-?1,66/07$B),=&/?0=$1/..,',0&6@+$0?&$:/&9?)&$8?0=,B),08,=$&?$&9,/'$<',8/=/?0+$,D<',==/E,0,==$(01$,8?0?-@$/0$-(/0&(0,08,"$

F?$7,&$($86,(','$E/,:$?.$.(8&=$(01$&?$,(=,$&9,$1/=8)==/?0+$&9,$()&9?'$1,</8&,1$($1/(7'(-$

?.$&9,$=8,0,'@$/0$:9/89$&9,$',(6/&@G-?1,6$1/=8)==/?0$&(>,=$<6(8,"$2=$&9,$:?'1=$.?'$&9,$1,*(&,$HI',(6/&@I$?'$I)0/E,'=(6I$,"7"$H$(',$?.&,0$)01,'=&??1$1/..,',0&6@$*@$&9,$1/E,'7,0&$

<?=/&/?0=+$&9,@$8(0$*,$7/E,0$1/..,',0&$<6(8,=$/0$&9,$1/(7'(-$(01$&9,$1/E,'7,08/,=$/0$E/,:$*,8?-,$-?',$,E/1,0&$(01$&9)=$,(=/,'$&?$*,$)01,'=&??1$.?'$&9,$1/E,'7,0&$<?=/&/?0="$F9,$

1/(7'(-$1?,=$J.?'$($=&('&K$0?&$=9?:$&9,$<,'=?0(6$E/,:$?.$&9,$()&9?'$(*?)&$&9,$',(6/&@G&?G-?1,6$',6(&/?0+$*)&$/=$L)=&$(0$,08?-<(==/07$6(01=8(<,$:9,',$&9,$1/E,'=,$<?=/&/?0=$8(0$*,$

,0&,',1"

$

M/7)',$NO$%8,0,'@$.?'$&9,$',(6/=-$1,*(&,

F9,$=8,0,'@$8(0$*,$/66)-/0(&,1$(88?'1/07$&?$&9,$1/..,',0&$<?=/&/?0="$M/7)',$P$=9?:=+$.?'$

,D(-<6,+$&9,$J,D&',-,K$<?=/&/?0$?.$=?6/<=/=-$(01$&9,$?0,$?.$&9,$I<6(&?0/(0I$',(6/=-O

M/7)',$PO$QD(-<6,=$?.$&9,$E/,:$?.$&9,$=8,0,'@$*@$=<,8/./8$<?=/&/?0=$

%"$NRP

%<(8,GF/-,$#,(6/&@

S*L,8&$P

S*L,8&$N

%)*L,8&

T/01

#,(6/&@$?)&=/1,$%<(8,GF/-,%<(8,GF/-,$#,(6/&@

S*L,8&$P

S*L,8&$N

%)*L,8&

T/01

%)*L,8&

T/01

#,(6/&@$?)&=/1,$%<(8,GF/-,

F9,$:?'61$,D/=&=$?06@$/0$-@$-/01"#,(6/&@$/=$(0$/66)=/?0"

QD&',-,$NO$%?6/<=/=-

%<(8,GF/-,$#,(6/&@

S*L,8&$P

S*L,8&$N

%)*L,8&

!"#$

#,(6/&@$?)&=/1,$%<(8,GF/-,

F9,$:?'61$,D/=&=$?06@$/0$-@$-/01"

#,(6/&@$/=$(0$/66)=/?0"

IU6(&?0/(0I$#,(6/=-

T(&&,'$?*L,8&=$(',$8',(&,1$*@$&9,$/1,(=$*,9/01$&9,-$J)0/E,'=(6=K"$S)'$%<(8,GF/-,$#,(6/&@$/=$L)=&$($=9(1?:$?.$&9,$-?',$',(6$/1,(6$:?'61"$! !"#$%&'()#(*("+%*&%,-

</

%<(8,GF/-,$#,(6/&@

S*L,8&$P

S*L,8&$N

%)*L,8&

T/01

#,(6/&@$?)&=/1,$

%<(8,GF/-,

V0/E,'=(6=

W1,(=

Page 19: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

!"#$%&'()*#+%')#(),(-./)#0%(-1-%,(#123,#(%/-0(-(+#+34)#+%')#(),()#1%#5-($&((6#7-4)#-,#

"-8&')#9*#12)#"%//%:-,8#$/3((-$#;-):(#3')#(2%:,#3,5#$%+03')5#:-12#12)#2)/0#%"#12)#($),)'<#5-38'3+=

>##(%/-0(-(+>##,3?;)#')3/-(+

>##@0/31%,-3,@#')3/-(+>##@3'-(1%1)/-3,@#')3/-(+

>##,%+-,3/-(+>##$%,$)01&3/-(+

A231#12)#5-($&((-%,#-(#,%1#%&1531)5#%'#1'-;-3/*#.)$%+)(#%.;-%&(#:2),#:)#/%%4#31#12)#:3<(#

5-"")'),1#0%(-1-%,(#5)3/#:-12#&,-;)'(3/(6

B-8&')#C=#D,-;)'(3/(#3(#()),#.<#C#5-"")'),1################B-8&')#E=#A3/4-,8#3.%&1#%.F)$1(0%(-1-%,(

B-8&')#C#$%+03')(*#2%:#@0/31%,-3,@#')3/-(+#G'-821H*#@3'-(1%1)/-3,@#')3/-(+#G+-55/)H#3,5#

$%,$)01&3/-(+#G/)"1H#&,5)'(13,5#&,-;)'(3/(6#B%'#12)#3&12%'#-1#-(#%.;-%&(#1231#12)()#5-"")'),1#;-):(#/)35#1%#5-"")'),1/<#.&-/1#4,%:/)58)#')0')(),131-%,(#:-12#$%,()I&),$)(#

')83'5-,8#12)#&()"&/,)((#%"#12)-'#-+0/)+),131-%,#3,5#')(&/1(6#

A2)#"3$1*#1231#:)#!"#$#3.%&1#%.F)$1(#3+%,8#5-"")'),1#(&.F)$1(#$3,#.)#355)5#1%#12)#5-38'3+#G"-86#EH#"%'#"&'12)'#$/3'-"-$31-%,6

J%+03'-(%,(#3(#-,#B-8&')#C#'3-()#12)#I&)(1-%,(*#:2-$2#$%,()I&),$)(#)3$2#0%(-1-%,#23(*#

:2-$2#%,)#-(#(&0)'-%'#G"%'#3#8-;),#0&'0%()H#3,5#2%:#:)#%&'()/;)(#(2%&/5#23,5/)#&,-;)'(3/(#"%'#%&'#%:,#')0')(),131-%,(6

A%#$/3'-"<#12-(#0%-,1#:)#/%%4#31#5-"")'),1#4-,5(#%"#&,-;)'(3/(#3,5#3(4#2%:#12)-'#0%(-1-%,#-,#

12)#($),)'<#-(6#J%&/5#-1#.)#1231#5-"")'),1#4-,5(#%"#&,-;)'(3/(#$%&/5#.)#0/3$)5#3,5#5)3/1#:-12#5-"")'),1/<K

A2)#"%//%:-,8#4-,5(#%"#&,-;)'(3/(#3')#0/3$)5#-,#12)#5-38'3+#3,5#$%+03')5=

>#L&+.)'(#3,5#%.F)$1(#%"#3.(1'3$1#/%8-$>#7-;-,8#(<(1)+(*#(&$2#3(#3,-+3/#(0)$-)(

>#B-$1-1-%&(#%.F)$1(>#L%,>+31)'-3/#$%,$)01(#G(&$2#3(#+)5-$3/#5-38,%()(H

A2)#3&12%'#0')(),1(#2-(#;-):#:-12#)M3+0/)(#-,#12)#($),)'<#5-38'3+#3,5#$/3-+(#1231#3#

5-(1-,$1-%,#.)1:)),#12)#5-"")'),1#&,-;)'(3/#8'%&0(#+34)(#(),()6#N,#5%-,8#(%*#)/)+),1(#%"#12)#@0/31%,-3,@*#@3'-(1%1)/-3,@#3,5#12)#$%,$)01&3/-(1#0%(-1-%,#3')#$%+.-,)5#1%#"%'+#3#

-,1)8'31-;)#;-):#%"#+%5)//-,8#G-,1)'0')131-%,#%"#')3/-1<H6

O6#9P9

O03$)>A-+)#Q)3/-1<

!.F)$1#9

!.F)$1#R

Q)3/-1<#%&1(-5)#O03$)>A-+)

O&.F)$1

S-,5

O&.F)$1

S-,5

O&.F)$1

S-,5

S%5)/

#D,-;)'(3/(#

O03$)>A-+)#Q)3/-1<

!.F)$1#9

!.F)$1#R

O&.F)$1

S-,5

O&.F)$1

S-,5

Q)3/-1<#%&1(-5)#O03$)>A-+)

D

&

DD,-;)'(3/(

A2)#O)+-%1-$#A'-3,8/)#/-,4(#$%,$)01(#-,#+-,5(#:-12#(<+.%/(#-,#+%5)/(#3,5#:-12#12)#')0')(),1)5#%.F)$1(6

T2)')#3')#D,-;)'(3/(K# ! !&1(-5)#O03$)>A-+)# ! @U/31%,-3,@! U3'1#%"#12)#!.F)$1(# ! @V'-(1%1)/-3,@! N,#12)#S-,5#%"#W&+3,(##! @J%,$)01&3/-(1@

Page 20: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

How should parthood relations be expressed in

SNOMED CT?

Franz Baader,1 Stefan Schulz,2 Kent Spackman,3 and Boontawee Suntisrivaraporn4

1TU Dresden, Germany, [email protected] University Hospital, Germany, [email protected]

3International Health Terminology Standards Development Organisation, USA, [email protected], Thammasat University, Thailand, [email protected]

The Systematized Nomenclature of Medicine, Clinical Terms (SNOMED CT)1 is a clinicalterminology with a broad coverage of health care, which has been developed with the helpof a rather inexpressive description logic dialect known as EL [1]. The advantage of using adescription logic (DL) for defining a medical ontology is that, instead of error-prone “hierarchyengineering,” where each newly introduced concept needs to be manually positioned at theright place in the concept hierarchy, one adds a definition of the new concept to the knowledgebase and the DL reasoner then automatically finds the right position of this concept in theconcept hierarchy. The advantage of using an inexpressive DL is that classification (i.e., thecomputation of the concept hierarchy) is fast even for a very large ontology like SNOMED CT.E!cient reasoners for EL, like SnorocketTM,2 which is based on the classification algorithmintroduced in [2], can classify SNOMED CT in less than a minute.

The disadvantage of using an inexpressive DL is that not all relevant properties can be explicitlyexpressed. In particular, EL does not allow to state that relations such as part-of are transitive,and consequently the reasoner cannot take transitivity into account during classification. Inorder to overcome such limitations in DLs without transitive relations, the SEP-triplet encodingwas proposed in [3]. An SEP-triple for the concept A is actually composed of three concepts:the structure AS , the entity A, and the part AP . Intuitively, the E-concept is supposed to beinstantiated by entire anatomical objects (such as my hand), the P-concept by the proper partsof the referred objects (such as any part of my hand), and the S-concept by both entire objectsand their parts. Fig. 1 gives an example of how a correct use of the SEP-triplet encoding shouldlook like. It is easy to see that transitivity of the part-of relation can be simulated throughthe intra-triple part-of relationships and the intrinsic transitivity of (both intra- and inter-triple) subsumption relationships. In fact, in the example of Fig. 1, the DL reasoner is ableto infer that the finger is part of the upper limb since we have Finger ! FingerS ! HandP !

HandS ! UpperLimbP ! "part-of.UpperLimb. Since characteristics are inherited along the is-ahierarchy, the SEP-triplet encoding also allows us to simulate inheritance of characteristicsalong the part-of hierarchy. In our example, by connecting an injury via a location link to theS-concept, we can ensure that ‘injury to finger’ is classified as ‘injury to hand’ and ‘injuryto upper limb’. To suppress such inheritance along the part-of hierarchy (e.g., ‘amputation offinger’ should not be classified as ‘amputation of hand’ or ‘amputation of upper limb’), oneneeds to connect via location to the E-concept. There are, however, several problems with theSEP-triplet encoding. On the one hand, the SEP-triplet approach is error prone since it workscorrectly only if it is employed with a very strict modelling discipline. For instance, incorrectlinks to the S-concept rather than the E-concept may result in unintended consequences likethe classification of ‘amputation of finger’ as a subconcept of ‘amputation of upper limb’. Onthe other hand, the approach introduces for every proper concept in the ontology two auxiliaryconcepts, which results in a drastic increase in the ontology size, and thus in the time neededfor classification.

1 http://www.ihtsdo.org/snomed-ct/2 http://aehrc.com/hie/snorocket.html

Page 21: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Finger

FingerS

FingerP

Hand

HandS

HandP

UpperLimb

UpperLimbS

UpperLimbP

AmputationOfFinger

InjuryToFinger

AmputationOfHand

InjuryToHand

AmputationOfUpperLimb

InjuryToUpperLimb

Fig. 1. Example of a correct use of the SEP-triplet encoding. The solid edges denote subsumption(IS-A), the dashed edges part-of, and the dotted edges has-location relationships.

To avoid these problems, we have proposed in [4] to use the more expressive DL EL++ [5, 6],for which classification can still be done in polynomial time. The complex role inclusion axiomsavailable in EL++ can be used to state reflexivity and transitivity of roles like part-of, subrolerelationships (e.g., between proper-part-of and part-of), and right-identity rules (which can,e.g., be used to express the inheritance of characteristics along the part-of relation). To avoidunintended inheritance of characteristics (e.g., in the case of amputation), we use two distinctrelations: has-location, which is inherited from a part to its whole, and has-exact-location, a sub-relation of has-location, which is not inherited that way. Fig. 2 shows the re-engineered ontologyobtained this way from the knowledge base of Fig. 1.

This new modelling approach avoids the introduction of the two additional auxiliary concepts(the S-concept and the P-concept) for every anatomical concept. The experiments reported in[4] show that this actually speeds up the time needed for classification. However, for backwardcompatibility, it would be nice to be able to define the S-concept and/or the P-concept incase it is needed (e.g., since it is used directly in other parts of the ontology). According tothe underlying intuition, this should be easy: these concepts can be pre-coordinated as fullydefined concepts, as illustrated here for the concept hand: HandP ! "proper-part-of.Hand andHandS ! "part-of.Hand.

Finger ! BodyPart" #proper-part-of.Hand (1)

Hand ! BodyPart" #proper-part-of.UpperLimb (2)

UpperLimb ! BodyPart (3)

AmputationOfFinger $ Amputation " #has-exact-location.Finger (4)

AmputationOfHand $ Amputation " #has-exact-location.Hand (5)

AmputationOfUpperLimb $ Amputation " #has-exact-location.UpperLimb (6)

InjuryToFinger $ Injury " #has-location.Finger (7)

InjuryToHand $ Injury " #has-location.Hand (8)

InjuryToUpperLimb $ Injury " #has-location.UpperLimb (9)

proper-part-of % proper-part-of ! proper-part-of (10)

proper-part-of ! part-of (11)

part-of % part-of ! part-of (12)

! ! part-of (13)

has-exact-location ! has-location (14)

has-location % proper-part-of ! has-location (15)

Fig. 2. The re-engineered version of the knowledge base in Fig. 1, now without SEP-triplets.

Page 22: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Unfortunately, this solution (which was already proposed in [4]) is not completely satisfactorysince not all subsumption relationships for the auxiliary concepts that follow from the SEP-encoded version of the knowledge base (Fig. 1) follow from the re-engineered version (Fig. 2)extended by the definitions for the S- and P-concepts for Finger, Hand, and UpperLimb. Forexample, in Fig. 1 we have the (stated) subsumption relationship FingerS ! HandP . Usingthe complex role inclusion axioms in Fig. 2 together with the definitions for the auxiliaryconcepts, we can only conclude "part-of.Finger ! "part-of.Hand (i.e., FingerS ! HandS), butnot "part-of.Finger ! "proper-part-of.Hand (i.e., not FingerS ! HandP ). In order to obtain thesecond subsumption, we would need to add the complex role inclusion

part-of # proper-part-of ! proper-part-of.

Interestingly, this left-identity rule, together with proper-part-of ! part-of, creates a so-calledcycle over role inclusions, which is not allowed in the DL SROIQ underlying the new versionof the Web Ontology Language, OWL2. Consequently, OWL2 compliant reasoners (like FaCTand Pellet) would not accept this extended knowledge base as an input. Fortunately, such acyclic dependency is allowed in EL++ and can be processed by our reasoner CEL.3 Recently,Kazakov [7] was able to design a decidable extension of SROIQ that can also express theextended knowledge base.

To sum up, we have recalled the re-engineering of SNOMED CT as proposed in [4], and haveshown that a backward compatible version, which also contains definitions for the auxiliaryS- and P-concepts, requires an additional complex role inclusion that destroys the acyclicityproperty of the set of complex role inclusion. For this reason, the backward compatible re-engineered version of SNOMED CT is not expressible in OWL 2, but it is expressible in EL++

and an appropriate extension of SROIQ.

References

1. F. Baader. Terminological cycles in a Description Logic with existential restrictions. In GeorgGottlob and Toby Walsh, editors, Proceedings of the 18th International Joint Conference on ArtificialIntelligence, pages 325–330. Morgan Kaufmann, 2003.

2. F. Baader, C. Lutz, and B. Suntisrivaraporn. Is tractable reasoning in extensions of the DescriptionLogic EL useful in practice? In Proceedings of the 2005 International Workshop on Methods forModalities (M4M-05), 2005.

3. S. Schulz, M. Romacker, and U. Hahn. Part-whole reasoning in medical ontologies revisited: Intro-ducing SEP triplets into classification-based Description Logics. Journal of the American MedicalInformatics Association (JAMIA), pages 830–834, 1998. Section VIII Standards and Policies - Issuesin Knowledge Representation.

4. B. Suntisrivaraporn, F. Baader, S. Schulz, and K. Spackman. Replacing SEP-triplets in Snomed ct

using tractable Description Logic operators. In Jim Hunter Riccardo Bellazzi, Ameen Abu-Hanna,editor, Proceedings of the 11th Conference on Artificial Intelligence in Medicine (AIME’07), volume4594 of Lecture Notes in Computer Science, pages 287–291. Springer-Verlag, 2007.

5. F. Baader, S. Brandt, and C. Lutz. Pushing the EL envelope. In Proceedings of the 19th Interna-tional Conference on Artificial Intelligence (IJCAI-05), Edinburgh, UK, 2005. Morgan-KaufmannPublishers.

6. F. Baader, S. Brandt, and C. Lutz. Pushing the EL envelope further. In Kendall Clark and Peter F.Patel-Schneider, editors, In Proceedings of the OWLED 2008 DC Workshop on OWL: Experiencesand Directions, 2008.

7. Y. Kazakov. An extension of regularity conditions for complex role inclusion axioms. In Proceedingsof the 2009 International Workshop on Description Logics (DL’09), 2009.

3 http://lat.inf.tu-dresden.de/systems/cel/

Page 23: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Ludger Jansen, Institut für Philosophie, Universität Rostock

Constitution relations in biological cells

Abstract für den 1. Workshop des GI-Arbeitskreises OBML

Research about living cells faces many problems. Among these are the most prominent: the

complexity of cell systems, the plurality of interacting levels and the stochastic nature of

many cell processes (Wolkenhauer/Muir, forthcoming). These problems find their analogies

when it comes to representing our knowledge about biological cells. The complex and

dynamic cell system as well as intra- and intercellular communication have to find simplified

and static representations. Because of the huge amounts of data that are being collected in

genome sequencing, animal models and other studies, the knowledge gained can only be

stored and made available by computer-based means. Formal ontology is a promising method

to get a grip on problems occurring with the coding of facts and with information retrieval and

there are first implementations for the domain of cells. The largest project is certainly the

Gene Ontology (GO) with its three parts concerning molecule functions, biological processes

and cell components (http://www.geneontology.org). Other relevant projects are the Cell Type

Ontology (Bard, Rhee and Ashburner 2005) that is, like GO, candidate ontologies of the Open

Biomedical Ontologies Foundry (http://obofoundry.org; Smith et al. 2007), and, because of its

application area, the National Cancer Institute Thesaurus (NCIT).

To analyse a complex system into simpler parts, many ontological distinctions are relevant.

For computer-based representations of knowledge about cells we need a classification of

continuants like cells and cell parts (Jansen 2008) as well as a classification of occurrents like

interaction processes between cells or cell parts and event types like Cell_division (Henning

2008, Schulz/Jansen 2009). The different levels of causal interaction in cells have to be

reflected by analysing several granular partitions of cells, like the molecular level, the level of

functional cell parts (organelles) and the level of the cell itself, as well as the combinations of

material and functional descriptions and the representation of mereological and topological

relations like Cell has_part Nucleus, Cell part_of Cell or Nucleus contains DNA.

A particular problem arises for cell ontology through the fact that some particulars seem to

belong to several of these levels: Unicellular organisms seem to be at once a single cell and an

organism, and strings of DNA seem to be at the same time molecules and functional cell

parts. These coincidences on the level of particulars threaten the generality necessary for

partonomic statements on the level of universals. This problem can be solved within standard

mereology by the help of the unity relation of material constitution (Baker 2000, 2007; cf.

also the papers in Rea 1997). While the unity relation of identity is an equivalent relation, the

constitution relation is irreflexive and asymmetric. (Baker’s definition at least implies the

transitivity of the constitution relation; cf. Zimmerman 2002.) The result is that particulars

belonging to different levels of partition – molecules, organelles, cells, organisms – are, first

impression notwithstanding, not identical, but constitute each other. Given this result, an

unambiguous assignment of particulars to these levels of partition is guaranteed. Secondly,

the generality of partonomic statements is no longer threatened by these special cases.

Thirdly, these levels of partition and the entities belonging to them can be characterized by

Page 24: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

their causal and explanatory features, that manifest themselves especially in the ascriptions of functions which entities have in different contexts – be it within cells, within organisms, or within an ecosystem.

Acknowledgements Research for this paper has been conducted under the auspices of the research cluster „Transformation wissenschaftlichen Wissens in den Lebenswissenschaften: Das Verständnis der lebenden Zelle im Wandel“ and has been supported by a grant of the Exzellenz-Förderprogramm Mecklenburg Vorpommern (EFP M-V). References

Baker, Lynne Rudder (2000), Persons and Bodies. A Constitution View, Cambridge:

Cambridge University Press. Baker, Lynne Rudder (2007), The Metaphysics of Everyday Life, Cambridge: Cambridge

University Press. Bard, Jonathan/Rhee, Seung Y./Ashburner, Michael (2005), An Ontology for Cell Types, in:

Genome Biology 6 (2), R21, http://genomebiology.com/2005/6/2/R21. Bittner, Thomas/Smith, Barry (2003), A Theory of Granular Partitions, in: Duckhamm/Good-

child/Worboys, Foundations of Geographic Information Science, London, 117-151. Hennig, Boris (2008), Zeitliche Entitäten: Geschehnisse, in: Jansen/Smith 2008, 127-154;

engl. Übers.: „Occurrents“, erscheint in: Munn/Smith 2008, 255-284. Jansen, Ludger (2007), Tendencies and other Realizables in Medical Information Sciences, in:

The Monist 90, 534-555. Jansen, Ludger (2008), Klassifikationen, in: Jansen/Smith 2008, 67-83; engl. translation:

Classifications, in: Munn/Smith 2008, 159-172. Jansen, Ludger/Smith, Barry, Hgg. (2008), Biomedizinische Ontologie. Wissen repräsentieren

für den Informatik-Einsatz, Zürich: vdf. Munn, Katherine/Smith, Barry (Hgg.), Applied Ontology, Frankfurt/Lancaster: Ontos,

erscheint 2008. Rea, Michael C. (1997), Material Constitution. A Reader, Lanham et al.: Rowman &

Littlefield. Schulz, Stefan/Jansen, Ludger (2009), Molecular Interactions: On the Ambiguity of Ordinary

Statements in Biomedical Literature, in: Applied Ontology 4, 21-34. Smith, Barry et al. (2007), „The OBO Foundry: Coordinated Evolution of Ontologies to

Support Biomedical Data Integration“, in: Nature Biotechnology 25, 1251–1255. Wolkenhauer, Olaf/Muir, Allan (forthcoming), The Complexity of Cell Biological Systems,

in: Clifford Hooker, John Collier (Hgg.), Philosophy of Complexity, Chaos and Non-Linearity (= Handbook of the Philosophy of Science, ed. Dov Gabbay, Paul Thagard and John Woods, vol. 16), Amsterdam: Elsevier.

Zimmerman, Dean (2002), Persons and Bodies: Constitution Without Mereology?, in: Philosophy and Phenomenological Research 64, 599-606.

Page 25: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

!"#$%&'()"*&#+($#,(-.#/%.#.0.#&'(12$3.#&#+(/(45&#'&*623(7.5($(82%$#0&'(9#025*52025

!"#$"#%&'()*+#,"#-).//0%.12034.'#56+#7'.)8/039.3+#%:;<.08

!"#$%&'()*+&,(%

=;.#%.12034.'#:>4039#?@?&.1#()&>1(&0:(//@#(??093?#AB-CDE#:>4.?#&>#3>)3#F;'(?.?#

'.F'.?.3&039#1.40:(/#40(93>?.?"#G>'#&;0?#F)'F>?.#&;.#3>)3#F;'(?.?#('.#?.1(3&0:(//@#(3(/@?.4#(34#*'>)9;&#03&>#(3#03&.'3(/#2>'1(&"#A3#&;0?#F'.?.3&(&0>3#<.#<(3&#&>#.HF/(03#&;.#

*(?0:#F'03:0F/.?#>2#>)'#?@?&.1"

-"#./,0(1(2/,+30#3%)#1,&*3&,(%30#431,1

I.#4>3J&#*./0.K.#&;(&#1>4./?#:(3#.K.'#*.#(#:>F@#>2#'.(/0&@"#5#'.F'.?.3&(&0>3#>2#(#4>1(03#:(3#>3/@#:>3&(03#(#103)&.#2'(:&0>3#>2#&;.#032>'1(&0>3#&;.#4>1(03#;>/4?#03#'.(/0&@#L=;0?#0?#

3>&#(#M).?&0>3#>2#9'(3)/('0&@N#(::>'4039#&>#&;.#F.'?F.:&0K.#:.'&(03#(?F.:&?#>2#&;.#4>1(03#('.#03#&;.#2>'.9'>)34#(34#>&;.'#03#&;.#*(:O9'>)34P"#Q>#1>4./#0?#:>1F/.&."#,>4.//039#0?#

(/<(@?#;.)'0?&0:#(34#(3#)/&01(&.#1>4./#>2#(#4>1(03#4>.?#3>&#.H0?&"#,>4./?#('.#!"#$%&%$#'#!(")*>2#'.(/0&@"#-022.'.3&#1>4./?#:(3#:>.H0?&"#

=;.#/(?&#?&(&.1.3&#0?#.K04.3&#2>'#>)'#&(?ON#=;.#/>90:#>2#&;.#AB-CDE#0?#3>&#:>;.'.3&#(34#

:(3#4022.'#2'>1#&;.#/>90:#>2#&;.#F;@?0:0(3?"#I.#1)?&#(::.F&#&;(&#/>90:?#4022.'#(34#<.#1)?&#*.#(*/.#&>#(4R)?&#&>#K('0>)?#?@?&.1?"#I.#2)'&;.'1>'.#1)?&#*.#(*/.#&>#&'(3?/(&.#2'>1#>3.#

?@?&.1#&>#&;.#>&;.'"

A3#&;.#2>//><039#<.#.HF>?.#&;.#:;('(:&.'0?&0:?#>2#>)'#?@?&.1N

5"#6(%+72&#3'+/,&7+&*'7

C#B>1F>?0&.#:>3:.F&#'.F'.?.3&(&0>3#L:>3:.F&#:/)?&.'?PC#%)F'.1(:@#>2#F>?&:>>'403(&0>3#!#&;.#:>1F>?0&.#:/)?&.'#2>'1#>2#:>3:.F&?#(/<(@?#

F.'?0?&?#!#3>#03?&(3&0(&0>3?#&>#(&>10:#>*R.:&?C#S?.)4>&'..?N#&;.#:>1F>?0&.#:/)?&.'?#*)0/4#F?.)4>&'..?+#<;0:;#(//><#.(?@#'.(4039#*@#

;)1(3?#(?#<.//#(?#*@#1(:;03.?#!#F.'2>'1(3:.C ,)/&0401.3?0>3(/#(34#1)/&02>:(/#>'9(30?(&0>3#>2#&;.#?.1(3&0:#?F(:.#

C#T02(:0(/0&@#L*02(:0(/#&@F039P#<0&;#:;(03#*)0/4039#!#>3/@#>3.#2>'1(/#:>3:.F&#./.1.3&C#5*(34>31.3&#>2#3(1.4#'./(&>'?+#'.?&'0:&0>3#&>#&<>#)33(1.4+#1.'./@#2>'1(/#'./(&>'?

C#=;.#L2>'1(/P#=UV#0?#&;.'.2>'.#'.?&'0:&.4#&>#W#./.1.3&?N#D#L3(1.4P#:>3:.F&#(34#X#L)33(1.4P#'./(&>'?

C#=;.#&<>#'./(&>'?#?F(3#&;.#?.1(3&0:#?F(:.#N#V3.#/03O?#:>3:.F&?#03?04.#&;.#401.3?0>3?+#&;.#>&;.'#O30&?#&;.#401.3?0>3?

8"#9*071#3%)#,%:7'7%+7#2'(+711

C#$)/.?#(34#?&(&)?#('.#2>'1.4#(/0O.N#:>1F>?0&.+#1)/&02>:(/#.&:"#L?..#(*>K.P

C#$.4):.4#?.&#>2#>F.'(&>'?C#Q>#4.2030&0>3?+#R)?&#')/.?#L<;0:;#:(3#*.#>K.'')/.4P

C Q>3C1>3>&>30:#/>90:C -@3(10:#&@F039

C#5/9>'0&;1?#<>'O#<0&;#:>1F>?0&.#:>3:.F&?#L3>&#(&>10:#>3.?PC#B;'>3>/>9@#:>3&'>/#*@#03F)&#(34#*@#4@3(10:#&'099.'?#03#&;.#?&(&)?

Page 26: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Abstract

Formal Semantics and Ontologies

Frank Loebe

Department of Computer Science and

Institute of Medical Informatics, Statistics and Epidemiology

University of Leipzig

The distribution and adoption of ontologies is remarkably increasing in various areas. In

particular, this applies in the context of the Semantic Web, and it is also the case regarding

applications in biology, medicine, and life sciences. This development has lead to a strong

emphasis on the formal representation of ontologies in dedicated ontology languages like the

Web Ontology Language (OWL). However, the same development runs the risk of neglecting

the original role of ontologies in computer and information sciences, namely to provide

systems of unambiguous semantic references. In this connection, ontologies share

motivational aspects with terminological systems and controlled vocabularies. It is this role of

ontologies that justifies their potential of enabling semantic interoperability.

In this regard, two interpretations of the term “semantics” should be clearly separated, but are

sometimes intermingled. Given an ontology (as a representation artifact), the first

interpretation of “semantics” refers to the conceptual or intensional semantics, i.e., to the very

categories / concepts (and their interdependences) that are represented by an ontology. If the

ontology is represented in a formal language, for instance, in a description logic language, the

resulting representation has an additional, formal semantics, due to and determined by the

formal semantics of the language. We argue that the practical use and view of ontologies in

biomedicine and life sciences is still more attuned to the first type of semantics, whereas in

the Semantic Web, ontology technology tends to become a general computational model that

is released from the original task of ontologies of offering conceptual foundation. This

difference and the kind of problems resulting from it can be demonstrated by recent attempts

to ground bio-ontologies or related formats like the one of the Open Biomedial Ontologies

(OBO) in Semantic Web languages. In particular, the position is defended that ontology

representation in a formal language results in an encoding of ontological relationships into

formal semantic entities that is to some extent arbitrary, but that is documented only in rare

cases and thus hinders justified ontology translation among different languages and formats.

Beyond the biomedical context and prior to improving the situation just described, we see a

need for advancing the theoretical basis of ontologies in computer and information sciences in

order to account for, at the same time, a formal semantic approach and the actual ontological

claims / commitments that are to be captured in and provided for by ontologies. For instance,

it can be argued that the current notion of ontology-based semantic integration fails for

ontologies themselves. In this connection, we outline a novel kind of semantics, called

ontological semantics. It is constructed in analogy to the well-known Tarski-style model

theory of first order logic (and description logics), but tries to avoid ontological commitments,

e.g., to a particular set theory, in its general form. Nevertheless, adopting a basic ontology is

beneficial for a more formal definition of a specific semantics of this kind. The main purpose

of ontological semantics is to serve as a background theory for a revised notion of intensional

semantic equivalence. For practical purposes, we advocate approximations in terms of well-

established formal languages, in order to leverage existing tools and technology.

Page 27: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Neglected tropical diseases - A challenge to biomedical ontology

engineering

Fred Freitas

Informatics Center - Federal University of Pernambuco, Brazil

Knowledge Representation and Knowledge Management Research Group

University of Mannheim, Germany

[email protected]

Stefan Schulz

Institut für Medizinische Biometrie und Medizinische InformatikUniversitätsklinikum Freiburg, Germany

[email protected]

The huge amount of data being collected in biomedical research, health care and public health requires

increasingly sophisticated representational formalisms. In the past, data had been structured by

classification systems and thesauri (e.g., the ICD and the MeSH). More recently, the need for a real

semantic interoperabibility has been formulated and is now being addressed by ontologies or

ontologically oriented terminology systems such as the OBO (Open biomedical ontologies) and

SNOMED CT, capitalizing on Semantic Web technologies.

We are developing a prototype ontology for the field of neglected tropical diseases, starting with

Leishmaniose as a disease of considerable public health impacts in many tropical countries. This is an

area that have largely been bypassed by past terminology and ontology endeavours although it

encompasses interesting challenges.

The project has novel aspects both from a domain and a computer science perspective. The domain

representation perspective focuses on the linkage among population-specific, clinical, lab data, and

scientific publications. The conceptual space of a global account of many tropical diseases as it includes

the following aspects:

- individual disease vs. affected populations

- treatment and prevention both related to individuals and populations

- biological organisms that play different roles: hosts, vectors, pathogens

- complicated organisms with different relevant lifecycles

- broad spectrum of disease manifestations

- importance of the natural division of geographic environments,

- socioeconomic factors, housing, mobility

- public health authorities

- administrative divisions of geographic space

The computer science prospect addresses the linkage of existing ontologies with large repositories of

organisms and geographic entities on the one, hand, and with existing ontologies (SNOMED, OBI, CO,

GO, …) on the other hand. Especially the integration of geographic data in an ontology gives rise to new

research questions, as it integrates large amounts of instance (A-Box) data with class (T-Box) data.

Development aspects encompass the assembly of the ontology based on diverse existing sources, the

linkage of different modules as well as import routines from external repositories. They include the

creation of a Web-based ontology browser and a Web-based retrieval interface, re-using existing

applications. One difficulty will lie in the fact that a language gap must be bridged: The ontology labels

are mostly in English (only partially extended by Spanish and Portuguese labels available from

multilingual terminologies), whereas many resources are only available in Portuguese. The solution is

here to re-use and adapt an existing cross-language indexing system.

An important research aspect is to provide evidence that a useful domain ontology can be created with

limited resources largely based on existing material, using Semantic Web standards. To this end, a set of

test queries is under elaboration and a gold standard is being produced by manual relevance judgments.

Page 28: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Thus several document and fact retrieval scenarios can be evaluated and the usefulness of the ontology

can be assessed.

We have devised three use cases to be developed in medium term:

• Decision support systems (DSS) for neglected diseases – this was defined as the first goal of the

project and its main integration point with OTICSSS, an emerging health information integration

initiative in Brazil. The main objective here is to develop an ontology-based information

integration system in order to query heterogeneous neglected disease-related databases from

different governmental sources (county, state and country). Such a cooperation will certainly

allow both projects to contribute to each other, particularly if both take alternative research

directions to tackle the problem.

• A search engine for biomedical documents. The Freiburg University spin-off Averbis GmbH

(www.averbis.de) is developing tools and resources that can be useful for building a new,

specific, cross-lingual retrieval environment, by mediating between documents and ontology

descriptions using morphosemantic abstraction performed by these tools.

• Intelligent agents that assist diagnosis and prognosis of diseases under scrutiny in potential and

actual patients. This can be a breakthrough in the project, and a good usage testbed for the

ontologies to be constructed.

!

Page 29: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

!"#$%&&'()%*(+,$+-$%,$+,*+'+./$0#1(.,$&%**#2,$-+2$

-3,)*(+,%'$%4,+25%'(*(#1$*+$&"#,+*/&#$+,*+'+.(#1$%,0$

*"#$#6*2%)*(+,$+-$%,$+,*+'+./$+-$%,%*+5()%'$-3,)*(+,1$

!"#$%&'("$)*+"%,!"#$$%&'%()*%+,-./0!

-.$/012%3//$'45"*56'45"7"!"1/-1/*)213-.+*42(,'125#$2672/,8$0!

86*$&'9$/:"!"($#9-)$:*,+6/,8$0!

!"#$%&'$(

;'1%42-19!6#*<! *1! 2+6-.4*14! .-#$! 4&.-'/&-'4!=2-#-/<,!>#4&-'/&!+-#$%'#*.! 3'1%42-19!*.$! %-:$.$8!21!

4&$! ?$1$! @14-#-/<A! 4&$.$! 29! %'..$14#<! 1-! 6'=#2%#<! *:*2#*=#$! -14-#-/<! -3! *1*4-+2%*#! 3'1%42-19,!

@14-#-/2%*#! %-1928$.*42-19! -1! 4&$! 1*4'.$! -3! 3'1%42-1*#! *=1-.+*#242$9! *18! 4&$2.! .$6.$9$14*42-1! 21!

%'..$14! 6&$1-4<6$! -14-#-/2$9! 9&-B! 4&*4! B$! %*1! *'4-+*42%*##<! $C4.*%4! *! 9($#$4-1! 3-.! 9'%&! *1!

-14-#-/<! -3! *1*4-+2%*#! 3'1%42-19! =<! '921/! *! %-+=21*42-1! -3! 6.-%$99A! 6&$1-4<6$! *18! *1*4-+<!

-14-#-/2$9,!!

D$!6.-:28$!*1!-14-#-/2%*#! *1*#<929!-3! 4&$!1*4'.$!-3! 3'1%42-19!*18! 3'1%42-1*#! *=1-.+*#242$9! 4&*4! 29!

%-+6*42=#$!B24&!4&$!+-94!%'..$14!:2$B9!-1!4&$!-14-#-/2%*#!8$321242-1!-3!3'1%42-19,!;.-+!4&29!*1*#<929A!

B$!8$.2:$!*1!*66.-*%&!4-!4&$!*'4-+*42%!$C4.*%42-1!-3!*1*4-+2%*#!3'1%42-19!3.-+!$C29421/!-14-#-/2$9!

'921/! *! %-+=21*42-1! -3! 1*4'.*#! #*1/'*/$! 6.-%$9921/A! /.*6&5=*9$8! *1*#<929! -3! 4&$! -14-#-/2$9! *18!

3-.+*#! 213$.$1%$9,! ! D$! *66#<! -'.! *66.-*%&! 4-! 4&$! E'+*1! F&$1-4<6$! @14-#-/<! *18! 4&$! G-'9$!

F&$1-4<6$! @14-#-/<! =<! '921/! 4&$! ;-'18*42-1*#! G-8$#! -3! >1*4-+<! *18! 4&$!G-'9$! >1*4-+<! *9!

=*%(/.-'18!-14-#-/2$9!*18!6.$9$14!6.$#2+21*.<!.$9'#49,!!

;'.4&$.+-.$A!B$! 214.-8'%$! *! 1$B! .$#*42-1! 4-! .$#*4$!+*4$.2*#! -=H$%49! 4-! 6.-%$99$9! 4&*4! .$*#27$! 4&$!

3'1%42-1! -3! 4&$! -=H$%4! 4-! *:-28! *! 1$$8#$99! 8'6#2%*42-1! -3! 6.-%$99$9! *#.$*8<! 6.$9$14! 21! 4&$! ?$1$!

@14-#-/<! 21! *!1$B!-14-#-/<!-3! *1*4-+2%*#! 3'1%42-19,!D$!829%'99! 9$:$.*#! #2+24*42-19!-3! 4&$! %'..$14!

-14-#-/2$9! 4&*4! 942##! 1$$8! 4-! =$! *88.$99$8! 4-! $19'.$! *! %-19294$14! *18! %-+6#$4$! .$6.$9$14*42-1! -3!

*1*4-+2%*#!3'1%42-19!*18!3'1%42-1*#!*=1-.+*#242$9,!

!

Page 30: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

The ChEBI ontology Nico Adams, Paula de Matos, Adriano Dekker, Marcus Ennis, Janna Hastings, Kenneth Haug,

Duncan Hull, Zara Josephs, Steve Turner and Christoph Steinbeck.

The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK

Chemical Entities of Biological Interest (ChEBI) is a freely available database and

ontology of molecular entities and chemical concepts, which is manually annotated to

a high standard of quality, and is non-redundant, differentiating it from other publicly

available chemistry resources. It focuses specifically on those chemical entities which

are of interest to the life sciences community, including metabolites and drugs.

The ChEBI ontology includes a structure-based and a role-based classification. The

relationships that are used to link between entities in the ontology are

• is a, which is used to indicate subsumption, such as L-alanine residue

(CHEBI:46217) is a alanine residue (CHEBI:32441);

• has part, which is used to indicate composition, such as diclofenac sodium

(CHEBI:4509) has part diclofenac(1-) (CHEBI:48311);

• has role, which is used to link a molecular entity to a role which it may perform,

such as kanamycin A sulfate (CHEBI:6109) has role antibacterial drug

(CHEBI:36047).

• chemistry-specific relationships is enantiomer of and is tautomer of; is conjugate

[base / acid] of; is substituent group from; has parent hydride and has functional

parent.

Recently, ChEBI has incorporated the structures, synonyms and citations of ~440000

bioactive compounds from the ChEMBL (http://www.ebi.ac.uk/chembldb/) drug-

discovery dataset. However, these entities have not yet been classified into the

ontology. The size of the dataset implies that such classification will never be feasible

with the manual approach used heretofore in ChEBI. Thus, we are currently exploring

approaches for structure-based automatic classification which include

• the extraction of features from the chemical structure (using standard

cheminformatics techniques) and subsequent logic-based classification into

classes defined using those features;

• the analysis of the structures for similarity on the basis of cheminformatics

structural keys and the association of particular structural keys with classes.

A further challenge is the automatic association of bioactivity roles to the compounds

from this dataset. Bioactivity is annotated within the source database in textual

format, necessitating a text-mining approach for extraction.

References

de Matos, P., Alcántara, R., Dekker, A., Ennis, M., Hastings, J., Haug, K., Spiteri, I., Turner, S., and

Steinbeck, C. (2009). Chemical entities of biological interest: an update. Nucleic Acids Res. pages

gkp886+.

Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., Alcántara, R.,

Darsow, M., Guedj, M. and Ashburner, M. (2008) ChEBI: a database and ontology for chemical

entities of biological interest. Nucleic Acids Res. 36, D344–D350.

Page 31: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

AN ONTOLOGY GENERATION PLUGIN FOR OBO-Edit

Wächter T. and M. Schroeder

Technische Universität Dresden Biotechnologisches Zentrum, Tatzberg 47-51,

01307 Dresden, Germany, Telephone: +49 (351) 463 40068

Introduction:

Developing ontologies is a labor intensive process. Comfortable

editors such as OBO-Edit, which is developed and maintained by the Gene

Ontology Consortium, are a prerequisite for a good ontology design.

Results:

To speed up the manual process we developed an OBO-Edit plug-in for

semi-automated ontology generation. The ontology generation suggests terms,

definitions and parent-child relationships based on text mining and natural

language processing (NLP) techniques. State of the art NLP is used to rank the

relevance of terms for the domain to be modeled. Previous studies showed that

term generation can improve the completeness of an ontology by suggesting up

to 89% good candidate terms in the top 50 ranked terms. Public resources such

as Wikipedia, full text articles and web sites are incorporated to generate

definitions, which follow the well defined structure “A is a B with property C” if

available. Based on the generation of good definitions it is possible to suggest

the likely parent of A, namely B.

Methods:

In a three step procedure the plug-in supports the creation of new

ontology terms from text. As textual resources free text or a query for PubMed

abstracts can be submitted. In a first step the terminology mentioned in the text is

retrieved and ranked according to its importance to the domain. Abbreviations

and lexical variants are recognized and terms similar to existing OBO terms are

indicated as displayed. The list of candidate terms can be searched and filtered

with regular expression patterns to focus on specific lexical aspects of interest. In

a second step definitions for terms are generated and presented to the curator.

The defined candidate terms, which are enriched with synonyms are in the final

step inserted into the ontology. We addressed the difficulty of finding the correct

position in a tree structure by using all information available to the plug-in. All

potential parent terms e.g. all terms of the Gene Ontology, are displayed as list

and are ranked higher, if they are (a) selected in OBOEdit, (b) have a certain

lexical overlap with the new candidate concept, (c) are contained in the specified

definition, or (d) evidence for an relationship could be found in any of the OBO

listed ontologies.

With the novel Ontology Generation plug-in for OBO-Edit 2, we contribute to the

community by increasing the tool support for the development and maintenance

Page 32: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

of biomedical ontologies. We used this plug-in for the creation of the Go3R

Ontology. Go3R is the world-wide first search engine on alternative methods

building on new semantic technologies that use an expert-knowledge based

ontology to identify relevant documents.

Availability: The plug-in is available as part of the beta53 release of OBO-Edit.

References:

Sauer, U. G., Wächter, T., Grune, B., Doms, A., Alvers, M. R., Spielmann, H.,

and

Schroeder., M. (2009). Go3R - semantic internet search engine for alternative

methods to animal testing. ALTEX. (shared first author)

Winnenburg, R., Wächter, T., Plake, C., Andreas, D., and Schroeder, M. (2008).

Facts from text: Can text mining help to scale-up high-quality manual curation

of gene products with ontologies? Briefings in Bioinformatics.

Alexopoulou, D., Wächter, T., Pickersgill, L., Eyre, C., and Schroeder, M. (2007).

Terminologies for text-mining; an experiment in the lipoprotein metabolism

domain. BMC Bioinformatics.

!

Page 33: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Concurrent ontology building with Collaborative Protégé

Authors: Daniel Schober1,2, James Malone2, Robert Stevens3

Affiliations: 1Institute for Medical Biometry and Medical Informatics, University

Clinic, Freiburg, Germany, 2European Bioinformatics Institute, Cambridge, UK, 3Manchester School of Computer Sciences, Manchester, UK

Introduction

Although living in a world of greater international collaboration, the geographical

distribution of developers still makes collaborative development approaches for

ontology development difficult to realize. If collaboration is not built into tools,

reaching widespread community consensus and encoding according to some

shared plan is not easy to achieve. For this reason, tools are developed that

allow not only for distributed collaborative ontology creation and modification, but

for direct and topic-linked communication about all aspects of the engineering

process as well. To investigate this process and corresponding capabilities of the

new Collaborative Protégé 3 (CP) tool, the Ontology of Biomedical Investigations

(OBI) was enriched in an experiment ran as part of an OntoGenesis network

meeting (website: http://ontogenesis.ontonet.org/moin/NetworkMeeting7) at the

European Bioinformatics Institute (EBI).

We investigated the CPs plugins [1] ability to:

o Facilitate multiple concurrent edits of a single owl file from different

computers

o Track annotations associated with specific representational units

(RUs), e.g. on classes or properties

o Track annotations associated with actions of ontology change

(deletions, axiom edits and annotation edits)

o Support for discussion threads and instant messaging

communication between ontology developers (real time chat).

In this talk we present our observations and recommendations for CP based

upon this experience.

Method

Our methodology involved the following set of tasks:

o Familiarization of users with Collaborative Protégé 3.4, its GUI and

collaborative features.

o Ad hoc additions of attendee's own lists of devices (with possibility

of duplication).

o Controlled additions of devices from a list as provided by the

metabolomics standards initiative (MSI).

o Deployment of an 'Agent Provocateur' to assess the transparency

of the changes occurring to others. These conflicting and

deliberately incorrect edits were made during a specified period of

Page 34: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

time known only to the session organizer.

o Controlling/restricting communication channels (notes, discussion

threads and chat) to evaluate CPs ability to facilitate communication

in distributed, collaborative development.

The communication and interaction of the participants with each other, directly or

through the tool, were tracked and analyzed. The Obi.owl file was populated with

new !device" classes from the domains of the OntoGenesis members and as

taken from a list provided by the Metabolomics Standard Initiative (http://msi-

ontology.sourceforge.net/). Detailed statistics on numbers and kinds of

annotations made during the sessions with tables, diagrams and further

discussions are available in a spreadsheet from the OntoGenesis website.

Initially, development occurred in a single group but this was then divided into

subgroups. Ad hoc additions were made which was followed by subgroups

adding classes from the provided MSI term list. The results were then reviewed

and commented by the other subgroups adding annotations/notes. Subsequently,

other communication channels were tested. First, chat only, then by voice only

and after that by chat and voice together. During the latter stages of this session,

the Agent Provocateur user was deployed.

Results

A realistic collaborative ontology building session was set up to test CPs

collaborative editing and communication features. Collaborative ontology building

was relatively trouble free and the tool also copes with complicated setups and is

flexible enough to allow for corresponding adjustments.

We highlight some unfulfilled requirements:

Editing functionalities:

The lack of a RU and module locking mechanism meant that others could alter

classes that have a logical impact on the class under current definition by another

user. A roll back function would aid in conflict resolution and would lead to safer

editing. Subscription and Notification were requested, where users subscribe to

certain areas of interest within the ontology and are then notified of any changes

that occur in those areas.

Annotations on RUs with entity notes:

For minor and trivial annotations providing an annotation type, subject heading

and value in an overly granular manner was perceived as overkill. Also the

change track captured in the project-linked ChAO knowledge base is sometimes

presented in an overly granular manner. Users would like the changes to be

described in a high level abstraction, rather than at a detailed granular level.

Communication:

Page 35: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Chats were requested to be linked with specific RUs and axioms to aid a more

immediate and direct conflict resolution. A closed 'retreat room' was desired as

well as a filter function on user names to enable to see only the chats of certain

people or on particular ontology modules. Integration of emoticons in text fields

would increase transmittance of pragmatic aspects of communications.

Planning:

Integrated voting functionalities allow users to vote on change issues. A

mechanism that changes the ontology based on vote outcomes would increase

development time and could be implemented using ChAO information and

formalized voting outcomes. Issue tracker functions were requested, i.e. a

scratch pad or todo list that can be worked through and 'checked', e.g. indicating

a proposed plan and what has been already realized at a certain time point.

Conclusion

Although some caveats persist, it became clear that the CP tool is now in an

advanced state and can be used in practice with sufficient stability and much can

be done with configuration to further optimize it. Our practice-driven requirement

and fault analysis provoked much feedback to the tool developers, and will be

valuable for the CP version of P4, which is in preparation. A paper collating all

results has been accepted for the ICBO 2009 Conference and will be published

in their Proceedings. We will continue to investigate CP in further ontogenesis

meetings and hence will gain further insights into the process of software guided

collaborative functionalities for ontology engineering, ensuring continuous

feedback to the CP developers.

References

[1] Noy N, Tudorache T, de Coronado S, Musen M, Developing biomedical

ontologies collaboratively. AMIA Annu Symp Proc. 2008, p. 520-4!

Page 36: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Mining ontology concepts from literature for automated gene annotation.

Conrad Plake and Rainer Winnenburg

Bioinformatics Group. Technische Universität Dresden. Tatzberg 47-51. 01307

Dresden, Germany

High-throughput screens such as microarrays and RNAi screens produce huge

amounts of data. They typically result in hundreds of genes, which are often

further explored and clustered via enriched GeneOntology terms. The strength of

such analyses is that they build on high-quality manual annotations provided with

the GeneOntology. However, the weakness is that annotations are restricted to

process, function and location and that they do not cover all known genes in

model organisms. GoGene addresses this weakness by complementing high-

quality manual annotation with high-throughput text mining extracting co-

occurrences of genes and ontology terms from literature. GoGene contains over

4 000 000 associations between genes and gene-related terms for 10 model

organisms extracted from more than 18 000 000 PubMed entries. It does not

cover only process, function and location of genes, but also biomedical

categories such as diseases, compounds, techniques and mutations. By bringing

it all together, GoGene provides the most recent and most complete facts about

genes and can rank them according to novelty and importance. GoGene accepts

keywords, gene lists, gene sequences and protein sequences as input and

supports search for genes in PubMed, EntrezGene and via BLAST. Since all

associations of genes to terms are supported by evidence in the literature, the

results are transparent and can be verified by the user.

Information gained from gene-mutation studies can reveal function and disease

implications. The automated retrieval and integration of information about protein

point mutations in combination with structure, domain and interaction data from

literature and databases promises to be a valuable approach to study structure-

function relationships in biomedical data sets. We developed a rule- and regular

expression-based protein point mutation retrieval pipeline for PubMed abstracts,

which shows an F-measure of 87% for the mutation retrieval task on a

benchmark dataset. In order to link mutations to their proteins, we utilize a named

entity recognition algorithm for the identification of gene names co-occurring in

the abstract, and establish links based on sequence checks. Vice versa, we

could show that gene recognition improved from 77% to 91% F-measure when

considering mutation information given in the text. To demonstrate practical

relevance, we utilize mutation information from text to evaluate a novel solvation

energy based model for the prediction of stabilizing regions in membrane

proteins. For five G protein-coupled receptors we identified 35 relevant single

mutations and associated phenotypes, of which none had been annotated in the

UniProt or PDB database. In 71% reported phenotypes were in compliance with

the model predictions, supporting a relation between mutations and stability

issues in membrane proteins. We present a reliable approach for the retrieval of

Page 37: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

protein mutations from PubMed abstracts for any set of genes or proteins of

interest. We further demonstrate how amino acid substitution information from

text can be utilized for protein structure stability studies on the basis of a novel

energy model.

We provide online access to our automatically derived gene annotations with

ontology-aided browsing at: http://gopubmed.org/gogene.

References

[1] Plake et al.: GoGene: gene annotations in the fast lane. Nucleic Acids Res.,

37(Web Server issue), W300-4, 2009. Link:

http://nar.oxfordjournals.org/cgi/content/abstract/gkp429

[2] Winnenburg et al.: Improved mutation tagging with gene identifiers applied to

membrane protein stability prediction. BMC Bioinformatics 2009, 10(Suppl 8):S3.

Link: http://www.biomedcentral.com/1471-2105/10/S8/S3!

Page 38: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

A Four-Level Translational Approach to Model Surgical Processes Goldstein D

1, Loebe F

2, Herre H

3, Neumuth T

1

1

Universität Leipzig, Innovation Center Computer Assisted Surgery (ICCAS), Leipzig, Germany

2 Universität Leipzig, Department of Computer Science 3

Universität Leipzig, Institute of Medical Informatics, Statistics and Epidemiology (IMISE),

Research Group Ontologies in Medicine and Life Sciences (OntoMed)

To specify surgical interventions in a precise and formal way is an essential requirement

for many applications in the field of surgery, including the instruction of trainees, quality

assessment and evaluation, as well as computer-assisted surgery (CAS). Presently, there

are various different approaches to modeling surgical procedures. However, these

different approaches have varying focuses, and are thus characterized by varying degrees

of granularity. Furthermore, they lack a common and agreed-upon conceptual foundation.

This greatly hinders the interoperability, comparability, and uniform interpretation of

process data. For scientific purposes, however, such a uniform foundation would be

beneficial, facilitating the acquisition and exchange of data, the interpretation and

transition of study results, and the conveying and adaptation of tools and methods.

Therefore, we propose a generic, formal framework for the specification of surgical

processes. In this workshop, our method will be presented in combination with its design

methodology. The design follows a four-level, translational approach and encompasses

an ontological foundation for the formal level of our approach.

The expressive power and the unifying capacity of the presented framework will be

shown. For this aim, the framework was applied to four different, already existing models

of surgical procedures. We will show, that the presented framework allows for a uniform

representation of process models arising from different techniques.

The four-level approach is designed to capture knowledge about the progression of

surgical interventions. It shows a consistent translation of natural language to a level that

is near implementation and supports different research fields, such as the evaluation of

surgical assist systems, optimization, and re-engineering of surgeries, and the use of

workflow management systems in the operating room.

Page 39: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

!"#$%&'(&)*&+,&-&"./.0#")1#,)20(0./%)3,/0")4.%/-&-)

5/"-678,0-.0/")5&(&9)4":/)!;<)

=;-&)>"-.0.;.&)3&,%0")?=>3@)

)

30#A&'0B/%)0"1#,A/.0#")-C-.&A-)/,&)'&D&%#+&')0")#,'&,).#)"/D0(/.&9)-&/,B89)-.,;B.;,/%%C),&+,&-&".9)

/"')(,/+80B/%%C)'0-+%/C)E0#%#(0B/%)-C-.&A-)?F@G)3C)A&/"-)#1)E0#A&'0B/%)0"1#,A/.0#")-C-.&A-)

,&-&/,B8&,-)/,&)/E%&).#)0".&(,/.&)/"')B#AE0"&)/)D/,0&.C)#1)E0#%#(0B/%)'/./9)/"/%CH&),&%/.0#"-80+-)

E&.$&&")'/./9)+%/")&I+&,0A&".-9)/"')1;,.8&,A#,&)A#'&%)E0#%#(0B/%)+,#B&--&-G)J8&C)/,&)/%-#);-&')0")

A&'0B/%).8&,/+C)+%/""0"()/"')0")&';B/.0#")0")E0#%#(C)/"')A&'0B0"&G)K#,)&I/A+%&).8&,&)/,&)

0"1#,A/.0#")-C-.&A-).8/.)-;++#,.).8&)&I+%#,/.0#")#1)E0#B8&A0B/%)B#""&B.0#"-)/"'),&%/.0#"-)-;B8)/-)

(&"#A&)-&L;&"B&)/--&AE%0&-)?M@)#,)'011&,&"B&-)#1)(&"#A&-)#1)-+&B0&-)?N@G)4"#.8&,).C+&)#1)

0"1#,A/.0#")-C-.&A-),&+,&-&".-)-+/.0/%)/"')-+/.0/%6.&A+#,/%),&%/.0#"-9)1/B0%0./.0"().8&)/"/%C-0-)#1)

A#,+8#%#(C)/"')-#A&.0A&-)/%-#)+8C-0#%#(CG)5&,&)N')/"')O')'0(0./%)/.%/-&-)B#A&)0".#)+%/CG)

J8#-&)/.%/-&-)-&,D&)/-)B#AA#")-+/.0#6.&A+#,/%),&1&,&"B&)-C-.&A-)0")$80B8)'/./)1,#A)'011&,&".)

&I+&,0A&".-9)0A/(0"()A#'/%0.0&-)/"')-B/%&-)/,&)0".&(,/.&'G))K;,.8&,A#,&9)-&A/".0B)0"1#,A/.0#")0-)

%0"P&').#).8&)/.%/-)'/./G)J80-)-&A/".0B)%0"P0"(),&L;0,&-).8&),&(0-.,/.0#")#1).8&)'/./)$0.8)/")#".#%#(C)#,)

B#".,#%%&')D#B/E;%/,C)?O@G)Q".#%#(0&-)0"B,&/-&).8&)+#.&".0/%);-&)/"'),&;-/E0%0.C)#1)'0(0./%)/.%/-&-)EC)

1#,A0"()/)-./"'/,'0H&')/++,#/B8).#)/""#./.0"(9)/"/%CH0"(9)/"')L;&,C0"()'/./)?O@)?R@G)

S#)1/,9)'0(0./%)/.%/-)(&"&,/.0#")B#"B&".,/.&-)#")A#'&%0"()"&;,/%)-.,;B.;,&-9)-;B8)/-)E,/0"-)#,)"&,D&)

B#,'-)#1)A/AA/%-)/"')0"D&,.&E,/.&-)?O@G)>")#,'&,).#)-;++#,.).8&);"'&,-./"'0"()/"')/"/%C-0-)#1)

-.,;B.;,/%)/"')1;"B.0#"/%)B8/,/B.&,0-.0B-)#1)E,/0")-.,;B.;,&-9)#".#%#(0&-)"&&').#)1;%10%)-&D&,/%)

,&L;0,&A&".-G))4")0'&/%)E,/0")#".#%#(C)$#;%')0"B%;'&)/)B#A+%&.&)-&.)#1)-.,;B.;,/%)+/,.-)/"')"&,D&)

.C+&-G)>.)$#;%')1;,.8&,)B#"./0")/I#"/%)+,#:&B.0#"-)E&.$&&"),&(0#"-)/"')"&,D&).C+&-)/"')0.)$#;%')

0"B%;'&)A#,+8#%#(0B/%9)B#""&B.0#"/%9)/"')&%&B.,#+8C-0#%#(0B/%)+,#+&,.0&-)#1)"&;,#"-G)4")0'&/%)

#".#%#(C)/%-#)$#;%')E&)-+&B0&-6-+&B010B)?R@G)Q".#%#(0&-).8/.)'&-B,0E&)E,/0")+/,.-9)"&,D&-)/"')"&;,/%)

B#""&B.0#"-)8/D&)1#,)&I/A+%&)E&&")'&D&%#+&')1#,).8&),/.)E,/0")?R@9).8&)A#;-&)E,/0")?T@9)/"').8&)1%C)

E,/0")?U@G))

V&)-./,.&').#)E;0%')/")#".#%#(C)#1).8&)-.,;B.;,&-)#1).8&)8#"&CE&&)E,/0")0")$80B8)+/,.-)#1).8&)E&&)

E,/0")/,&)A#'&%&')1,#A)$8#%&A#;".-)'#$").#)-C"/+.0B)-$&%%0"(-)?E#;.#"-@)#1)"&,D&-G)V&)%0"P&')#;,)

#".#%#(C).#).8&)-;,1/B&),&B#"-.,;B.0#"-)#1).8&)8#"&CE&&)-./"'/,')/.%/-)?5S3@)?W@)EC)/--0("0"().8&)

,&B#"-.,;B.0#"X-)>2)/"')10%&)"/A&).#)/++,#+,0/.&)+/,.-)#1).8&)#".#%#(CG)J80-)-.&+)/B.;/%%C)&"/E%&-)/")

#".#%#(C6E/-&')E,#$-0"()#1).8&)/.%/-G)>")/)10,-.);-/(&)/++,#/B8)#1).8&)#".#%#(C6%0"P&')5S3)$&)

/'',&--&').8&)/;.#A/.0B)B,&/.0#")#1)A&/"0"(1;%)D0-;/%0H/.0#"-G)Q1.&").8&)+,#B&--)#1)B,&/.0"()

A&/"0"(1;%)/"')&I+,&--0D&)D0-;/%0H/.0#"-)0-).0A&)B#"-;A0"()/"'),&L;0,&-)(##')P"#$%&'(&)#1).8&);-&')

D0-;/%0H/.0#")-#1.$/,&G)>")#;,)/++,#/B8)/%%).8&);-&,)8/-).#)'#)0-).#)-&%&B.)/)-.,;B.;,&).#)E&)D0-;/%0H&')

/"').#)-&%&B.)/)+,&'&10"&')L;&,C9)-;B8)/-)YS8#$)#D&,D0&$YG)4")/%(#,0.8A).8&")/;.#A/.0B/%%C)B,&/.&-)

/)D0-;/%0H/.0#").8/.)B#"./0"-).8&)-&%&B.&')-.,;B.;,&)80(8%0(8.&')/-)/)1#B;-)#E:&B.)/"')1;,.8&,)

-.,;B.;,&-)1#,A0"().8&)B#".&I.)?Z@G))

>").8&)"&I.)-.&+-)$&)$/".).#)'&D&%#+)/")#".#%#(C6E/-&')/.%/-)E,#$-0"().8/.)0"B%;'&-)"#.)#"%C)

,&B#"-.,;B.0#"-)E;.)/%-#).8&)#,0(0"/%)'/./).8&),&B#"-.,;B.0#"-)/,&)E/-&')#"G)K;,.8&,)$&)$#;%')%0P&).#)

+,#D0'&)/)A#,&)D0-;/%)/++,#/B8)1#,)E,#$-0"().8&)#".#%#(CG)V&)$/".).#)0".&(,/.&)/)(,/+8)E/-&')

#".#%#(C),&+,&-&"./.0#").8&);-&,)B/")0".&,/B.)$0.8)/"')$80B8)+,&-&".-)0".&,&-.0"()0"1#,A/.0#")0")/")

0".;0.0D&)$/CG)

Page 40: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

!

!"#"$"%&"'(

"#!!"#$%&'#(#")%!*+,-.$%/01*-2(*)%3#",$%4,,'5!$!%&'!$(()*+,-!.*!/&,*0123!415&6!789.&:9!;1*<*38#!

!""#$%&'()*(+&,-&.(",/*01&$"2&3#/$"&.("(4*015&=>>"?!@*<#!=?!((#!ABACAD=#!

=#!6$'7#$%85%9*#12#7)%:.0;7%<5%=0>(+07)%&707>%8*",1)%:-#?#7%=5%@5%=,7#25!$;877CEF(<*)&)6!@19G+<1H123!

I&2*:&!7&JG&2,&!$99&:K<1&9#!6777&89$"1$04*,"1&,"&:*1#$%*;$4*,"&$"2&<,/=#4(9&.9$=>*015&=>>L?!@*<#!

"M?!N?!((#!OO"COOO#!

A#!@*"*0.%@#$#")%!0+0"0%@;7A7#")%4072B#-#"%CD*2-#"5!P1H;&&6!$!PG<.19,+<&!782.&28!;)*'9&)#!6777&

89$"1$04*,"1&,"&:*1#$%*;$4*,"&$"2&<,/=#4(9&.9$=>*015&=>>L?!@*<#!"M?!N?!((#!OLDCL>B#!

B#!=$1%8,1*7#)%E".FG07H%3##)%I"-.;"%!,H05!/131.+<!+.<+9&9!+9!+!5)+:&'*)Q!5*)!0+.+!9-+)123#!?9,"4*(91&*"&

@(#9,10*("0(5&=>>O?!@*<#!=?!"?!((#!">>C">N#!

M#!@*.0*1%8,-0)%30""$%J5%:K072,75!;$P7!2&G)*+2+.*:1,+<!*2.*<*386!0&9132!+20!1:(<&:&2.+.1*2#!

?9,"4*(91&*"&@(#9,*"-,9/$4*015&=>>O?!@*<#!=?!=#!

N#!I1L#"-%8;"H#")%<;7>07%<0?*'2,7)%M*>.0"'%801',>(5!!"$4,/A&B"4,%,C*(1&-,9&D*,*"-,9/$4*015&

E9*"0*=%(&$"2&E9$04*0(?!!7()123&)?!4*20*2?!=>>O#!

D#!R-&!S(&2!;1*:&01,+<!S2.*<*31&9!#!TS2<12&U!=>>L#!-..(6VV'''#*K*5*G20)8#*)3V#!

O#!M,L#"-%8"07'-)%!,"2-#7%M,.1D*7()%=N"H#7%M$L0()%:0L*7#%O",D>A*()%I1#P07'#"%@0$#)%@01-#%

J#2-#".,DD)%4072F6."*2-*07%4#H#)%M07',1D%@#7A#15!R-)&&C/1:&291*2+<!$W&)+3&C7-+(&!$.<+9!*5!.-&!

X*2&8K&&!;)+12!+20!Y.9!$((<1,+.1*29#!8>(&F,#9"$%&,-&<,/=$9$4*)(&@(#9,%,CA5&=>>M?!@*<#!BL=?!"?!((#!"C

"L#!

L#!I7Q0%O;R)%:-#DD#7%C",.02(0)%8QS"7%@#$#")%=N"H#7%M$L0()%4072F6."*2-*07%4#H#5!S2.*<*38CK+9&0!

W19G+<1H+.1*2!*5!-1&)+),-1,+<!2&G)*+2+.*:1,+<!9.)G,.G)&9#!T&0#U!Z-+)<&9![#!;*.-+?!I*)0*2!\120<:+22!

+20![)&1:!;&)-+)0#!E9,0((2*"C1&,-&4>(&7#9,C9$=>*01&G,9H1>,=&,"&:*1#$%&<,/=#4*"C&-,9&D*,/(2*0*"(&

:<DI&JKKLM&==5&NOOPNLQM&=>>O#!

!

Page 41: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

Beyond Structure: KiSAO and TEDDY – Two OntologiesAddressing Pragmatical and Dynamical Aspects of

Computational Models in Systems BiologyChristian Knüpfer1, Dagmar Köhn2, and Nicolas Le Novére3

1 Institute of Computer Science, University of Jena, Germany2 Institute for Database and Information Systems, Rostock University,

Germany3 Computational Neurobiology Group, European Bioinformatics Institute

Cambridge, Great Britain

MotivationComputational models are becoming more and more the central scientific par-adigm for understanding the complexity of living systems. With the increasingnumber and size of these models there is a growing need for model reuse andexchange. Furthermore, detailed models are not manageable without computersupport. There are e!orts to formalise the mathematical structure of models(e.g. SBML) and to standardise the kinetic and biological meaning of modelcomponents (e.g. SBO, GO, UniProt). However, formalising only the structureof computational models is not su"cient to easily exchange and reuse modelsand to achieve full computer support for modelling. We also need to formalisethe pragmatical and dynamical aspects of models.

For this purpose we propose two ontologies: The Kinetic Simulation Algo-rithm Ontology (KiSAO) and the TErminology for the Description of DYnamics(TEDDY). KiSAO covers algorithms used for simulation of computational mod-els. The ontology classifies and puts into context existing simulation algorithmsthrough the use of several characteristics, such as deterministic/stochastic orspatial/non-spatial. The aim of TEDDY is to provide terms for describing andcharacterising dynamical behaviours, observable dynamical phenomena, andcontrol elements of biological models and biological systems in Systems Biol-ogy and Synthetic Biology.

KiSAOKiSAO classifies simulation algorithms applicable to biological models usingdi!erent categories and a hierarchy of algorithm versions. Each term containsinformation about synonyms, a definition and a publication reference.

Classification: Simulation algorithms are classified wrt. the following dimen-sions:

• algorithm using deterministic/stochastic rules (e.g. Euler forward vs.Smoluchowski equation based method),

• Spatial/non-spatial approaches, (e.g. Green’s function reaction dynamicsvs. Euler forward),

• discrete/continuous variables (e.g. Cellular automata vs. Livermore sol-ver), and

Page 42: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

• fixed/adaptive time-step approaches (e.g. Cellular automata vs. Green’sfunction reaction dynamics).

Algorithm Hierarchy: The algorithms are arranged in a subclass hierarchy,e.g.

algorithm using stochastic rules (KiSAO:0000036)

Gillespie-like stochastic simulation method (KiSAO:0000025)

sub-volume stochastic reaction-di!usion algorithm (KiSAO:0000095)

is-a

is-a

KiSAO is encoded in OBO and developed using OBO-Edit. A transforma-tion of KiSAO-OBO into OWL can be generated using Protégé. For detailson KiSAO see the MIASE project page http://sourceforge.net/projects/miase.

TEDDYIn order to describe the dynamics of a model TEDDY comprises the followingmain categories:Temporal Behaviour: terms for the actual (temporal) dynamical behaviour

of models (e.g. Limit Cycle, Stable Fixed Point),

Behaviour Characteristic: terms for characterising concrete behaviours (e.g.Period) and for discriminating between types of behaviours (e.g. Stablevs. Unstable),

Behaviour Diversification: terms for the ability of systems to exhibit di!er-ent behaviours dependent on parameters (e.g. Supercritical Hopf Bifurca-tion) and with respect to perturbations (e.g. Bi-Stability), and

Functional Motifs: terms for structural features of systems necessary for spe-cific behaviours (e.g. Negative Feedback) and intended for specific func-tions (e.g. Integrator).

There are di!erent types of relations between TEDDY classes:• relations between nearby Temporal Behaviours (e.g. convergeTo),

• relations between Temporal Behaviours and Behaviour Characteristics(e.g. hasStability),

• relations between Functional Motifs and Temporal Behaviours (e.g. de-pendsOn), and

• relations between Behaviour Diversifications and Temporal Behaviours(e.g. hasSuperPart).

TEDDY is encoded in OWL and developed using Protégé 4. The last releaseof the ontology contains 135 terms. For details on TEDDY see the project pagehttp://sourceforge.net/projects/teddyontology.

Page 43: 0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML) [Workshop of the Working Group "Ontologies in …

An Epistemically Adequate Theory of Causality

Hannes Michalek, LeipzigOBML Workshop, November 2009

The Empirical Method understood as performing experiments (studies/trials)is the best approach that we as humans have developed to discover causal rela-tions. Therefore, an “epistemically adequate” theory of the ontology of causalitymust be able to explain why the Empirical Method is successful. And in whatways it is not.

Keeping above aim in mind, we initially follow the ordinary route of ontologicalanalysis:

Conceptual Analysis: What does “causality” mean?

• Limiting the scope: physical causality

• Causality relies on regularity

• Causality relies on counterfactual dependency

Ontological Model: What kinds of entities and relations play a role?

• Presentials as primary causal relata

• Extending the basic causal relation to cover processes as well

• Possible worlds as alternative situations (in this world)

• Regularity and counterfactual dependency

(Formalization is skipped, here)

Then we address the question of epistemic adequacy:

General epistemical considerations:

• In what sense and to what extent are the ontological elements ofour theory epistemically accessible at all: Presentials, coincidencepairs, universals, clusters of coincidence pairs, probabilistics on theseclusters, . . .

Reconstructing experiments/trials:

• How can we understand (or: reconstruct) the basic elements, the suc-cess, and the limitations of performing experiments in terms of ourtheory?

Upshot: Experiments (studies/trials) create tailor-made clusters ofalternative situations with controlled (non-)existence of the allegedcauses and strict detection of the (non-)existence of the expected ef-fects. These alternative situations are the basis for determining therelations of regularity, counterfactual dependency and thus: causality.