0+! 1(/&(2(-#,/! #/!3#(4,5#6#/!0/5! 2,3 · workshop des gi-arbeitskreises ontologien in...
TRANSCRIPT
!
!
!
!
"#$%&!'($)%*(+!("!&*,!
-#.'($)#/-!-$(0+!
!
1(/&(2(-#,/!
#/!3#(4,5#6#/!0/5!
2,3,/%'#%%,/%7*8"&,/9!
:(342;!
!
<=.<>!/(?,43,$!<@@A!
2,#+6#-!
!
!
!
!
!
!
!
Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML)
[Workshop of the Working Group "Ontologies in Biomedicine and the Life Sciences"]
WEDNESDAY, 25th November
13h00 Welcome remarks
Session 1 Metrics for ontology evaluation (Chair: Janet Kelso)
13h20-13h40 Hartung Evolution of Life Science Ontologies
13h40-14h00 Gross Quality of Functional Annotations in Life Science Data Sources
14h00-14h20 Kirsten Matching large Life Science Ontologies
14h20-14h40 Brochhausen Applying Corpus-Based Ontology Evaluation – The Case of the ACGT Master Ontology
14h40-15h00 Auer Linked Data for the Life Sciences
15h00-15h20 Dietzold Collaborative Editing and Publishing of Linked Data with OntoWiki
15h20-15h40 COFFEE
Session 2 Terminologies for clinical diagnostic support (Chair: Stefan Schulz)
15h40-16h00 Niggemann Snomed CT, Description Logic and Decision Support
16h00-16h20 Hellmann Einsatz der Terminologien SNOMED CT und ID MACS am Beispiel der elektronischen
Organspendeerklärung (eOSE)
16h20-16h40 Robinson Clinical diagnostics in human genetics with semantic similarity searches in ontologies
16h40-17h00 Straub Dynamic Typing and Non-monotonic Reasoning - Principles for a Semantic Interpreter
Session 3.1 Theoretical principles (Chair: Heinrich Herre)
17h00-1720 Baader How should parthood relations be expressed in SNOMED CT?
17h20-17h40 Jansen Constitution relations in biological cells
19h00 DINNER at the Restaurant "Madrid" Klostergasse 3-5. http://www.cafe-madrid.de/
Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswissenschaften (OBML)
[Workshop of the Working Group "Ontologies in Biomedicine and the Life Sciences"]
THURDAY, 26th November
Session 3.2 Theoretical principles
09h00-09h20 Straub Reality and Abstraction: A Look at Different Models
09h20-09h40 Loebe Ontological semantics
Session 4 Ontology engineering and use (Chair: Robert Hoehndorf)
09h40-10h00 Freitas Neglected tropical diseases - A challenge to biomedical ontology engineering
10h00-10h20 Ngonga The application of an ontology design pattern for functional abnormalities to phenotype ontologies
and the extraction of an ontology of anatomical functions
10h20-10h40 Hastings The ChEBI ontology
10h40-11h00 Waechter An ontology-generation plug-in for OBO-Edit
11h00-11h20 COFFEE
11h20-11h40 Schober Concurrent ontology building with Collaborative Protégé
11h40-12h10 Plake Mining ontology concepts from literature for automated gene annotation.
12h10 LUNCH
Session 5 Modelling and Causality (Chair: Frank Loebe)
13h10-13h30 Neumuth A Four-Level Translational Approach to Modeling Surgical Processes
13h30-13h50 Mudunuri
13h50-14h10 Hege Knowledge Representation via Digital Brain Atlases
14h10-14h30 Knuepfer Beyond Structure: KiSAO and TEDDY – Two Ontologies Addressing Pragmatical and Dynamical
Aspects of Computational Models in Systems Biology
14h30-14h50 Michalek A theory of causality
15h00-16h00 Open discussion: OBML Working Group
Participant List: Workshop des Arbeitskreises OBML
Sören Auer (Universität Leipzig)(*)
Franz Baader (TU Dresden)(*)
Annette Bals (Fresenius-Netcare)
Felix Balzer (Charité Universitätsmedizin Berlin)
Mathias Brochhausen (IFOMIS, Saarbrücken)(*)
Martin Boeker (Universität Freiburg)
Heiko Dietze (TU Dresden)
Sebastian Dietzold (Universität Leipzig)(*)
Fred Freitas (UFPE - Brazil)(*)
Dayana Goldstein (ICCAS, Universität Leipzig)
Niels Grewe (Universität Rostock)
Anika Groß (Universität Leipzig)(*)
Janna Hastings (European Bioinformatics Institute)(*)
Michael Hartung (Universität Leipzig)(*)
Hans-Christian Hege (Zuse-Institut Berlin (ZIB))(*)
G. Hellman (HellmannConsult)(*)
Heinrich Herre (Universität Leipzig, IMISE)
Robert Hoehndorf (MPI EVA, IMISE, Universität Leipzig) (*)
Ludger Jansen (Universität Rostock)(*)
Janet Kelso (MPI EvA)
Toralf Kirsten (IZBI und IMISE, Universität Leipzig)(*)
Axel Klarmann (zwonull media)
Christian Knüpfer (Universität Jena)(*)
Anja Kuß (Konrad Zuse Institut, Berlin)
Frank Loebe (Universität Leipzig)
Hendrik Mehlhorn (IPK Gatersleben)
Christian Meißner (Universität Leipzig, ICCAS)
Hannes Michalek (Onto-Med, IMISE)(*)
Raj Mudunuri (Universität Leipzig, ICCAS)(*)
Bardo Nelgen (SemaWorx)
Thomas Neumuth (Universität Leipzig, ICCAS)(*)
Axel Ngonga (Universität Leipzig)
Jörg Niggemann ( CompuGroup Software GmbH)(*)
Conrad Plake (BioTec TU-Dresden)(*)
Djamila Raufie (Universität Freiburg)
Peter Robinson (Institut für Medizinische Genetik, Universitätsklinikum Charite, Berlin)(*)
Andread Schierwagen (Universität Leipzig)
Daniel Schober (Universität Freiburg)(*)
Stefan Schulz (Universität Freiburg)(*)
Holger Stenzhorn (Universitätsklinikum Saarland)
Hans Rudolf Straub (Semfinder AG)(*)
Stephan Vollrath (Universität Leipzig, MPI)
Thomas Wächter (BioTec TU-Dresden)(*)
Rainer Winnenburg (Biotec TU-Dresden)
Nico Wüstneck (Universität Leipzig)
Matching large Life Science Ontologies
Toralf Kirsten
Interdisciplinary Centre for Bioinformatics, University of Leipzig Institute for Medical Informatics, Statistics, and Epidemiology, University of Leipzig
Ontologies become increasingly important in life science application domains. Many of them have been recently developed and are frequently used to semantically describe specific prop-erties of biological objects. For instance, molecular-biological objects, such as genes and pro-teins, are described (annotated) with information on the functions and processes they are in-volved in whereas a disease ontology can be utilized to describe the finding of a patient's check-up. Ontologies provide controlled vocabularies for a uniform naming of concepts (and thus the description of object properties) to help to reduce variations in terminology. A very popular ontology is the Gene Ontology (GO) consisting of three (sub-) ontologies on molecu-lar functions, biological processes and cellular components [GOC08]. While ontology con-cepts are increasingly associated with objects (collected in so called annotation mappings) there are only few connections between life science ontologies themselves reflecting their semantic relation.
Ontology matching addresses the problem of finding semantic relations between on-tologies. Each ontology relation, also called ontology mapping or alignment, subsumes a set of correspondences showing which of the ontology concepts are semantically related. While in other domains ontology matching focus primarily on finding semantically equivalent con-cepts, e.g., to overcome the heterogeneity of two product catalogues in e-business, there are also ontology alignments with domain-specific semantics in life sciences. For instance, a mo-lecular function "is involved in" a biological process or "acts in" a cellular component (all can be concepts of the equally named GO sub-ontologies) as described in [MTML06]. Due to the huge and rapidly increasing number of life science ontologies and their large amount of con-cepts it is nearly impossible and very time consuming to create these semantic relations manu-ally. Hence, many approaches have been proposed in recent years which automatically gener-ate candidate sets of concept correspondences that can then be validated by human experts.
The talk first gives an overview of match approaches that have been used in life sci-ences as well as current developments. Secondly, we introduce our research prototype GOMMA, the Generic Ontology Matching and Mapping Analysis system. GOMMA includes both, a highly scalable and space-efficient approach to manage many versions of different ontologies [KHGR09] and a comprehensive set of matchers and similarity functions that can be utilized to align versions of these ontologies. Taking these matchers and similarity func-tions into account, GOMMA allows not only to create ontology mappings but also to refine initially generated alignments by combining these matchers in a workflow-like manner. At the end we give an outline of next steps and open research topics. References: [GOC08] Gene Ontology Consortium: The Gene Ontology project in 2008. Nucleic Acids Re-
search (36 Database):D440-D444, 2008. [MTML06] S. Myre, H. Tveit, T. Mollestad, A. Laegreid: Additional Gene Ontology Structure for
impoved biological reasoning. Bioinformatics, 22(16): 2020-2027, 2006. [KHGR09] T. Kirsten, M. Hartung, A. Groß, E. Rahm: Efficient Management of Biomedical On-
tology Versions. Proc. of Intl. Workshop on Ontology Content, 2009.
Evolution of Life Science Ontologies
Michael Hartung
Interdisciplinary Centre for Bioinformatics, University of Leipzig
Ontologies have become increasingly important in life sciences. They consist of a set of concepts denoted by terms describing and structuring a domain of interest. Concepts are interconnected by different relationship types such as is_a and part_of relationships. Typical kinds of ontology application are the annotation of molecular-biological objects, data exchange in heterogeneous environments as well as the usage in analysis algorithms. For instance, the well-known Gene Ontology (GO) [1] is utilized for the consistent annotation of proteins, in particular the molecular functions and the biological processes in which they are involved, or the cellular components where they act. Using a common ontology for annotation ensures collaborative work on complex topics and allows for the exchange of results between different research groups and organizations.
Life sciences ontologies are not static, i.e., they are modified if new/revised knowledge becomes available or initial design errors need to be corrected. As a result of this continuous evolution ontology providers usually release new versions of their ontology whenever a revised version has been finished. Hence, an ontology version is valid as long as no newer version is provided. For instance, due to its high dynamic GO releases versions everyday. Other ontologies such as the NCI Thesaurus [4] or OBO ontologies [5] are released less frequently, e.g., on a monthly or half-year basis. The release of a new ontology version comes along with numerous problems, in particular for applicants that use the ontology in their analysis routines or for annotation purposes. For instance, the deletion of a concept in the newer version may cause out-dated annotations or may lead to changed analysis results. However, manual adaptation of data to a newer ontology version is error-prone and time-consuming and should be solved in a semi-automatic manner. Hence, it is interesting to know how intensive an ontology has been modified and what changes occurred during its evolution, especially when evolution information is not available to the users. Furthermore, it is important to know if an ontology is currently under high revision (i.e., is unstable) or receives only marginal refinements (i.e., is nearly stable).
The presentation is two-folded. The first part introduces a framework for analyzing the evolution of ontologies [2]. Particularly, an ontology model and measures for quantifying ontology evolution are presented. Selected evaluation results depict the evolution of 16 life science ontologies between 2004 and 2008 including GO, NCI Thesaurus and several OBO ontologies. The second part focuses on OnEX (Ontology Evolution Explorer) [3] a system which is based on the introduced framework. It allows for a web-based access to evolution information of ontologies. Particularly, users can inspect the evolution of whole ontologies as well as detailed information about changes on ontology concepts (e.g., attribute modifications). Furthermore, the tool supports the migration of out-dated annotations to newer versions of an ontology. An outlook discusses next steps and open research topics.
[1] Gene Ontology Consortium: The Gene Ontology project in 2008. Nucleic Acids Research (36
Database):D440-D444, 2008 [2] Hartung, M.; Kirsten, T.; Rahm, E.: Analyzing the Evolution of Life Science Ontologies and Mappings.
In Proceedings of Intl. Workshop on Data Integration in the Life Sciences (DILS), 2008 [3] Hartung, M.; Kirsten, T.; Gross, A.; Rahm, E.: OnEX – Exploring changes in life science ontologies.
BMC Bioinformatics 10:250, 2009 [4] Sioutos, N.; de Coronado, S.; Haber, M.W.: NCI Thesaurus: A semantic model integrating cancer-
related clinical and molecular information. Journal of Biomedical Informatics, 40:30-43, 2007 [5] Smith, B. et al.: The OBO Foundry: coordinated evolution of ontologies to support biomedical data
integration. Nature Biotechnology, 25(11):1251-1255, 2007
Applying Corpus-Based Ontology Evaluation – The Case of
the ACGT Master Ontology
Mathias BROCHHAUSEN a, Gintar! GRIGONYT"b a Institute for Formal Ontology and Medical Information Science, Saarland University, Germany
b Department for Applied Linguistics, Translation and Interpreting, Saarland University,
Germany
Introduction
Brewster et al. [1] proposed techniques to evaluate ontologies by comparing them with
terminology as used in domain specific natural language texts. This approach seems to
be very appealing in the field of medical ontologies, since huge collections of abstracts
are freely available from the internet. In this study we show results utilizing corpus-
based term extraction as reference to evaluate the ACGT Master Ontology (MO)
against. Our aim is to explore opportunities and restrictions of the cooperation between
NLP-created thesauri and reality-oriented ontology development.
Material
The ACGT project (Advancing Clinico-Genomic Trials on Cancer, FP6-IST-026996)
aims to address two key obstacles for bridging the gap between molecular research and
clinical practice:
! the flood of multilevel datasets (from the molecular to the organ to the
individual level),
! the lack of a common infrastructure for clinical research institutions and the
creators of molecular data.
As a result of this situation, very few cross-site studies and multi-centric clinical trials
are performed, and in most cases it is not possible to seamlessly integrate multi-level
data. ACGT aims to overcome these obstacles by setting up a semantic grid
infrastructure in support of multi-centric, post-genomic clinical trials. Semantic
integration in ACGT is done based on the ACGT MO as global schema in a Local-As-
View (LAV) strategy [2].
The ACGT Master Ontology (ACGT MO) is implemented in OWL-DL, the
description-logics based subtype of the Web Ontology Language (OWL) [3] and can be
freely downloaded from http://www.ifomis.org/acgt. The initial development/beta
version of the ACGT MO was published on June 2007, and it has been expanded on the
go by integrating needs of users, both clinical and technical. The developers are
working towards version 1.0. At the moment the ontology contains 1667 classes, 288
object properties, 15 data properties and 61 individuals.
Methods
Evaluation of ontologies is becoming a key issue in ontology-driven computing, but the
development of common standards on how to evaluate ontologies seems to be rather
slow. It is widely accepted that there is a central distinction traditionally drawn
between two different evaluation strategies namely “glass box” or “component”
evaluation and “black box” or “task based” evaluation. This distinction does apply to
evaluation processes regarding ontologies and ontology-driven systems as well [4, 5].
The two strategies must be seen as complementary each providing testing for different
kinds of qualities. For the present study we focus on the sub-task of evaluation of
domain coverage which is part of glass box evaluation.
In order to acquire a list of terms that are actively used in the domain and are
specific for it, we have applied the methodology for terminology extraction described
in [6]. It encompasses main terminology extraction stages: morphological analysis,
shallow parsing, rule based detection of noun phrases (NPs), term candidate extraction,
termhood assessment and building a term list hierarchy.
We have used the extracted thesaurus for establishing mappings with the classes of
the ACGT MO ontology.
With regard to the ACGT project we started with collecting 3334 domain specific
abstracts of scientific publications. In general, the domain of the ACGT MO is cancer
research and management, due to the focus of the project the ontology is concentrated
on three types of cancer: mammary carcinoma, nephroblastoma (Wilms’ tumour) and
rhabdoid tumour. The corpus of 3334 abstracts consists of tree registers – 1500
abstracts from Pubmed [http://www.ncbi.nlm.nih.gov/pubmed/] concerned with
mammary carcinoma and nephroblastoma, and 334 for rhabdoid tumour.
Results
The results we gained using domain thesauri to validate domain coverage of an
ontology mark this as a highly promising strategy. Especially checking the ontologies
for class names or labels that are actually used by domain experts is a highly important
step since it fosters accessibility and usability of the ontology to the domain experts.
The latter is a crucial aspect in the development and maintenance of clinical ontologies.
The domain experts are the only group that can effectively guide the maintenance in a
way that secures future usefulness to the actual clinical situation with its specific points
of view and restrictions. The results we achieved for the coverage of the ACGT MO
clearly hint to the fact that in order to optimize the accuracy of the testing, the corpora
should be enhanced with texts from patient documentation, since this is a key issue in
the applications of the ACGT MO. First steps in this direction have been taken, but the
results are not yet available.
References
[1] Brewster C, Alani H, Dasmahapatra S, Wilks Y. Data-driven Ontology Evaluation. Proceedings
of the 4th International Conference on Language Resources and Evaluation (LREC 2004),
Lisbon.
[2] Tsiknakis M, Brochhausen M, Nabrzyski J, Pucaski J, Potamias G, Desmedt C, et al. A
semantic grid infrastructure enabling integrated access and analysis of multilevel biomedical data
in support of post-genomic clinical trials on Cancer. IEEE Transactions on Information
Technology in Biomedicine, Special issue on Bio-Grids. 2008; 12 (2):205-217.
[3] OWL Web Ontology Language: Semantics and Abstract Syntax. Available from
http://www.w3.org/TR/owl-semantics/; last visited: 10-28-2009.
[4] Hartmann J, Spyns P, Giboin A, Maynard D, Cuel R, Suárez-Figueroa MC, et al Methods for
ontology evaluation Knowledge Web Deliverable D1.2.3, 2004.
[5] Gangemi A, Catenacci C, Ciaramita M, Lehmann J. Modelling Ontology Evaluation. In
Proceedings of the Third European Semantic Web Conference Berlin, Springer, 2006, pp. 140-
154.
[6] Avizienis A, Grigonyte G, Haller J, von Henke F, Liebig T, Noppens O. Organizing Knowledge
as an Ontology of the Domain of Resilient Computing by Means of Natural Language Processing
- An Experience Report. Artificial Intelligence Research Society Conference, 2009, Florida.
Quality of Functional Annotations in Life Science Data Sources
Anika Groß
Department of Computer Science, University of Leipzig Interdisciplinary Centre for Bioinformatics, University of Leipzig
Ontologies and their application have become increasingly important especially in the life sciences. Typically, they associate objects, such as genes and proteins, with well-defined ontology concepts to semantically and uniformly describe the properties of these biological objects. The association between an object and a concept of an ontology is often denoted as (functional) annotation. The set of all associations between a biological data source and an ontology forms a so-called annotation mapping. For instance, the genes and proteins of Ensembl [1] and Swiss-Prot [2] are associated with concepts of the popular Gene Ontology (GO) [3] to specify the molecular functions and biological processes in which the proteins are involved.
These GO annotations are utilized in analysis scenarios and applications such as functional profiling of large datasets (e.g., [4]), or instance-based ontology matching [5]. The computed results of such applications significantly depend on a good quality of the underlying annotations. One important quality aspect is the stability of annotations since major changes in annotation mappings may substantially influence or even invalidate earlier findings. This is a major issue since annotation mappings change frequently due to new research findings which result in modifications of the underlying ontologies, objects and annotation associations. Moreover, the quality of an annotation is influenced by its creation method, i.e., the method that has been used to generate the annotation (experimentally approved, based on author statements, generated by automatic algorithms). It affects how biologically founded or reliable an annotation is. The relevance of the creation method is underlined by the increasing use of so-called evidence codes (EC) to classify functional annotations based on the GO. Users may utilize EC to focus on specific annotations sets in their analysis/applications, e.g., only manually curated or automatically generated annotations.
The presentation will focus on functional annotations with respect to their quality and possible influences on “annotation-dependent” applications. The talk gives insights in the varying provenance of annotations due to different annotation creation methods. Moreover, the results of a quantitative evaluation of annotation evolution emphasize the need of assessing annotation stability [6]. The presentation highlights how our findings and the proposed assessment method for annotations can be valuable for users and applications of life science annotations. Future algorithms may utilize information of annotation history and quality to derive more reliable results as we initially investigated in a first approach to produce more robust ontology mappings [7].
[1] Hubbard, T.J.; Aken, B.L.; Ayling, S.; et al.: Ensembl 2009. Nucleic Acids Research 37, D690–D697
(Database issue), 2009 [2] Boutet, E.; Lieberherr, D.; Tognolli, M.: UniProtKB/Swiss-Prot. Methods in Molecular Biology 406,
89–112, 2007 [3] Gene Ontology Consortium: The Gene Ontology project in 2008. Nucleic Acids Research (36
Database):D440-D444, 2008 [4] Prüfer, K.; Muetzel, B.; Do, H. et al.: FUNC: a package for detecting significant associations between
gene sets and ontological annotations. BMC Bioinformatics 8(1), 41, 2007 [5] Kirsten, T., Thor, A., Rahm, E.: Instance-based matching of large life science ontologies. In: Cohen-
Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, pp. 172–187. Springer, Heidelberg, 2007
[6] Groß, A.; Hartung, M.; Kirsten; T.; Rahm, E.: Estimating the Quality of Ontology-based Annotations by Considering Evolutionary Changes. Proc. of 6th Int. Workshop on Data Integration in the Life Sciences (DILS), Springer LNCS 5647, 2009
[7] Thor, A.; Hartung, M.; Groß, A.; Kirsten, T.; Rahm, E.: An Evolution-based Approach for Assessing Ontology Mappings - A Case Study in the Life Sciences. Proc. of 13. GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web (BTW), 2009
Linked Data for the Life Sciences
Sören Auer
Institute for Computer Science,
University of Leipzig
Research Group Agile Knowledge Engineering and Semantic Web (AKSW)
Over the past 3 years, the semantic web activity has gained momentum with the
widespread publishing of structured data as RDF and in particular Linked Data. The
central idea of Linked Data is to extend the Web with a data commons by creating typed
links between data from different data sources [1,2]. Technically, the term Linked Data
refers to a set of best practices for publishing and connecting structured data on the Web
in a way that data is machine-readable, its meaning is explicitly defined, it is linked to
other external data sets, and can in turn be linked to from external data sets. The data
links that connect data sources take the form of RDF triples, where the subject of the
triple is a URI reference in the namespace of one data set, while the object is a URI
reference in the other [3].
The most visible example of adoption and application of Linked Data has been the
Linking Open Data (LOD) project [4], a grassroots community effort to bootstrap the
Web of Data by interlinking open-license data sets. Out of the more than 8 billion RDF
triples that are served as of July 2009 by participants of the project, approximately 148
million are RDF links between data sets.
A central interlinking hub of the emerging Data Web is DBpedia [5], which aims to
extract structured information from Wikipedia and to make this information available on
the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link
other data sets on the Web to Wikipedia data. It already contains rich semantic
descriptions about more that 2 million concepts. One of the probably best represented
domains on the Data Web are the life sciences, with a large number of life-science data
sets, such as DrugBank, Linked Open Drug Data, Linked Clinical Trials, Gene Ontology
and many, many more. In this talk, we will present the concepts and techniques of
Linked Data as well as exhibit their application perspectives for the Life Sciences.
[1] Berners-Lee, T.: Linked Data - Design Issues.
http://www.w3.org/DesignIssues/LinkedData.html
[2] C. Bizer, T. Heath, T. Berners-Lee: Linked Data - The Story So Far. In: International
Journal on Semantic Web & Information Systems, Vol. 5, Issue 3, Pages 1-22, 2009.
[3] Bizer, C., Cyganiak, R., Heath, T.: How to publish Linked Data on the Web.
http://www4.wiwiss.fuberlin.de/bizer/pub/LinkedDataTutorial
[4] http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
[5] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, S. Hellmann:
PDF DocumentDBpedia – A Crystallization Point for the Web of Data. Journal of Web
Semantics (JWS).
!
Collaborative Editing and Publishing of Linked Data with OntoWiki
Sebastian Dietzold
In this talk, we will introduce the Semantic Data Wiki OntoWiki [1] and the underlying Semantic
Web Application Framework [2]. We will show its capabilities in collaboratively creating,
maintaining and working with semantic data in a single OntoWiki instance and in addition to that,
how to work together in a network of wiki systems.
In particular, we will describe and show the following features:
- Working with Linked Data
- Using of Semantic Search Engines
- Tagging
- Facet-based browsing
- Using of arbitrary hierarchies and navigation structures
In addition to that, we will describe the underlying architecture and the extension capabilities.
[1] http://ontowiki.net
[2] Heino, N.; Dietzold, S.; Martin, M. & Auer: Developing Semantic Web Applications with the
OntoWiki Framework. In: S. Pellegrini, T.; Auer, S.; Tochtermann, K. & Schaffert, S. (ed.) -
Networked Knowledge - Networked Media. Springer, 2009, 221, 61-77
--
Sebastian Dietzold - Department of Computer Science; University of Leipzig
Tel/Fax: +49 341 97 323-66/-29 http://bis.uni-leipzig.de/SebastianDietzold
Snomed CT, Description Logic and Decision SupportJÖRG NIGGEMANN, COMPUGROUP SOFTWARE GMBH, MARTINSRIED
Abstract
Snomed is based on Description Logic (DL). It was a huge effort to do that, and in order to undertake such an effort a company would have defined a concrete and measurable goal, would have estimated costs and would have explored whether the goal could be reached by the proposed means and whether it could not be reached by cheaper means. There is no publication of such reasoning with respect to DL in Snomed CT, neither from the American Association of Pathologists nor from the IHTSDO. The common understanding is that
a) logic is needed in order to automatically prove consistency of the corpus of definitions
b) restriction to a subset of First Order Logic (FOL), namely Description Logic, and there again fur-ther restriction to a quite limited dialect of DL, is necessary to make that consistency proof computa-tionally tractable.
Currently the IHTSDO and its scientific contributors deliberate which new dialect of DL should be used in the future – again with a huge effort of migrating from its current form. This is the time to really pose the above questions and ask whether that migration is reasonable. In the scientific com-munity around IHTSDO the use of DL seems to be set in stone, but from the industry there are voices that the restriction to DL is counterproductive and that Full OWL should be used. One of these voices is mine.
In this contribution, I will argument that
1. Numerous current errors in Snomed CT show that computational consistency has nothing to do with medical correctness
2. The use of logic did not only not prevent those errors, but has generated new ones (e.g. the "Amputation Problem": Amputation of toe is-a amputation of foot)
3. Some of that is not the fault of the logic itself but of its false use – but that is even worse be-cause it is harder to repair and prevent
4. Because of the above, more or different logic will never make Snomed CT better than it is now.
5. If you want a good model, place it in the hand of good modelers. You can never replace them by restrictive formalisms.
The use of Description Logic also makes Snomed CT harder to use for decision support (DS). Clinical DS is used where things are not yet completely known. So, if a patient is dismissed with a successfully treated "intercostal neuralgia" – fine that we can code that in Snomed CT. However, he comes with "Chest pain". That can be coded and will trigger DS to suggest heart diagnostics. If we then know that the heart is OK, we have "chest pain of non-cardiac origin". That contains a negation, which is forbid-den in currently used Ontylog DL dialect and maybe will also be forbidden in future versions. For doc-umenting the process of clinical finding of a diagnosis and for triggering the right DS rules however, we have to be able to document our current knowledge about what the patient has not.
At this point comes a second restriction that is propagated by the proponents of DL. For the future they want to exclude such statements as "suspicion of" or "exclusion of" from Snomed CT. They say:
1/2
Snomed CT is for what is in the patient. A suspicion is in the head of the physician, so Snomed should not deal with that (this is called a "realist" approach).
I have a to reply to that:
1. As soon as an entry in the Electronic Health Record goes beyond the recording of direct meas-urements, we have no access to the "reality" of the patient. Statements such as "high blood pressure" and, even more, diagnoses like "high blood pressure disease of renal origin" are the result of a deliberate cognitive act of a physician.
2. In the daily "physicians notes" in the EHR, the physician explicitly wants to write down his thoughts about the patient. Any suspicion, but also risk assessment and prognosis are of that kind. If Snomed CT is not meant for that, so what?
3. Decision Support is explicitly meant to help the physician make decisions – a mental act. Therefore the input for DS Systems must be recordings of the current state of the physician's reasoning. A Snomed CT that would exclude everything that is in the physician's head would be unusable for Decision Support.
Summary:The use of Description Logic apparently does not make Snomed better. Those who propose to make a huge effort of migrating to a new DL dialect should be charged to prove that it is worth the costs.
To the contrary, the use of DL makes Snomed CT harder or even impossible to use for Decision sup-port, especially if it is coupled with a "realist" ideology.
There are "cognitive" and "conceptual" approaches to Ontology, which make "things in the heads of physicians" well accessible for well-founded and logical reasoning.
Therefore: Free Snomed CT from the restrictions of Description Logic and"realist" ideology!
2/2
Einsatz der Terminologien SNOMED CT und ID MACS am Beispiel der elektronischen
Organspendeerklärung (eOSE)
Gunther Hellmann1, Kai Heitmann2, Frank Oemig3 1 HellmannConsult, Erlangen 2 HL7-Benutzergruppe in Deutschland e.V., Köln 3 Agfa Health Care AG, Bonn Einleitung
Vermehrt wird von der Deutschen Stiftung Organtransplantation über eine Abnahme zur Bereitschaft zur Organspende berichtet. Das Bundesgesundheitsministerium hat reagiert und erwirkt, dass die Organspendeerklärung im Notfalldatensatz der Gesundheitskarte (eGK) integriert werden soll. HL7 Deutschland hat eine Projektgruppe initiiert, die sich mit der Analyse des bisherigen Organspendeausweises auseinandersetzt, um eine methodische Vorgehensweise für zukünftige angelehnte Themen zu entwickeln, Input für die zugehörige (eGK-)Anwendung zu liefern und die Standardisierung mittels HL7 zu fördern. Material und Methoden
Als Arbeitsgrundlage wurden das Transplantationsgesetz [1] und der heutige Organspendeausweis, wie er über die Bundeszentrale für gesundheitliche Aufklärung (BZgA) bereitgestellt wird, verwendet. Der Papierausweis liegt in deutscher und türkischer Sprache vor. Beilagen zu dem Ausweis gibt es in zehn weiteren europäischen Sprachen (z.B. bulgarisch) mit Übersetzungen der Organbegriffe. Die Organbegriffe wurden für SNOMED CT [4] übersetzt und identifiziert und für ID MACS [3] mittel Terminologieeserver bestimmt. Als Arbeitsmethodik wurde TOGAF [2] gewählt. Ergebnisse
In mehreren Schritten wurden acht Anwendungsszenarien identifiziert und beschrieben, die Informationsobjekte der Papierform begrifflich festgehalten und daraus die notwendigen elektronischen Objekte hinsichtlich Syntax, Semantik und Vorbelegung abgeleitet. Des Weiteren sind die Akteure und Rollen identifiziert und speziell die Organbegriffe abgebildet worden. Durch die Aufarbeitung wurden mehrere, in entsprechenden Gremien noch zu diskutierende Fragen identifiziert, z.B. die Sicherstellung der gesetzlichen Anforderungen durch verarbeitenden Systeme in Krankenhaus, die Einschränkung von Zugriffsberechtigungen durch „auskunftsberechtigte Ärzte“, Zuordnungsprobleme bei der Begriffsübersetzung oder terminologische Unschärfen bei z.B. „Teile der Hirnhaut“. Die Nutzung der Terminologien offenbarte noch weitere 5 Problemkategorien. Diskussion
Die Ergebnisse sind in der Arbeitsversion 08 [5] festgehalten und erfordern eine tiefer gehende Erörterung speziell der identifizieren Probleme, die partiell grundsätzlicher Natur sind, d.h. diese müssen auch für andere Anwendungsszenarien geklärt werden. Eine Abbildung nach HL7 V3 und CDA wird exemplarisch erfolgen.
Literaturstellen
[1] Transplantationsgesetz, September 2007.
[2] The Open Group: The Open Group Architekture Framework (TOGAF).
http://www.opengroup.org/architecture/togaf8/downloads.htm#Non-Member, Version 8.1.1,
Enterprise Edition, 2007.
[3] ID Gesellschaft für Information und Dokumentation im Gesundheitswesen GmbH & Co.
KGaA: ID MACS®
– Medical Semantic Network. Berlin, 2009.
[4] College of American Pathologists (CAP): SNOMED CT (Systematized Nomenclature of
Medicine-Clinical Terms).
http://www.cap.org/apps/cap.portal?_nfpb=true&_pageLabel=snomed_page, 2009.
[5] HL7-Benutzergruppe in Deutschland e.V.: Implementierungsleitfaden „elektronische
Organspendeerklärung“ (eOSE). Version 08, Feb. 2009.
Einreichung angedacht zum
1. Workshop des GI-Arbeitskreises Ontologien in Biomedizin und Lebenswisenschaften
(OBML), Leipzig
(https://wiki.imise.uni-leipzig.de/Gruppen/OBML/Workshops/2009)
Schlüsselworte: Organspende, HL7, Transplantation, DSO, BZgA, Semantik,
Terminologie, Interoperabilität, eGK, Standard, mehrsprachig
Version 1.0, Entwurf
Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Köhler S
1,2, Schulz MH
3, Krawitz P
1,2, Bauer S
1, Dölken S
1,2, Ott CE
1, Mundlos C, Horn D
1, Mundlos S
1,2,3,
Robinson PN1,2,3
1) Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany. 2) Berlin-Brandenburg Center for Regenerative Therapies (BCRT) 3) Max Planck Institute for Molecular Genetics, The differential diagnostic process attempts to identify candidate diseases that best explain a set of clinical features. This process can be complicated by the fact that the features can have varying degrees of specificity, as well as by the presence of features unrelated to the disease itself. Depending on the experience of the physician and the availability of laboratory tests, clinical abnormalities may be described in greater or lesser detail. We have adapted semantic similarity metrics to measure phenotypic similarity between queries and hereditary diseases annotated with the use of the Human Phenotype Ontology (HPO) and have developed a statistical model to assign P-values to the resulting similarity scores, which can be used to rank the candidate diseases. We show that our approach outperforms simpler term-matching approaches that do not take the semantic interrelationships between terms into account. The advantage of our approach was greater for queries containing phenotypic noise or imprecise clinical descriptions. The semantic network defined by the HPO can be used to refine the differential diagnosis by suggesting clinical features that, if present, best differentiate among the candidate diagnoses. Thus, semantic similarity searches in ontologies represent a useful way of harnessing the semantic structure of human phenotypic abnormalities to help with the differential diagnosis. We have implemented our methods in a freely available web application for the field of human Mendelian disorders. We will also discuss the exact computation of score distributions for similarity searches in ontologies that can be used for the above described clinical application. We introduce a simple null hypothesis which can be used to compute a P-value for the statistical significance of similarity scores. We concentrate on measures based on Resnik’s definition of ontological similarity. A new algorithm is proposed that collapses subgraphs of the ontology graph and thereby allows fast score distribution computation. The new algorithm is several orders of magnitude faster than the naive approach, as we demonstrate by computing score distributions for similarity searches in the Human Phenotype Ontology. The HPO is freely available at http://www.human-phenotype-ontology.org and the Phenomizer is available at http://compbio.charite.de/Phenomizer.
!"#$%&'(#)*(+,-&.#/&%0)1(+(2003(#&(4%55".")&(60*"$-
!"#"$%&'()*+$%,-./01,'$23+$4',)56/07,0+$%89:,/5
;9,0$:,$',<',=,0&$>0?:6,17,+$9?:$1?$&9,$,6,-,0&=$?.$?)'$',<',=,0&(&/?0$',6(&,$&?$
,6,-,0&=$?.$',(6/&@A$2',+$,"7"+$&9,$',6(&/?0=$*,&:,,0$&9,$',<',=,0&/07$,6,-,0&=$,B)(6$&?$&9,$',6(&/?0=$*,&:,,0$&9,$',<',=,0&,1$,6,-,0&=A$C/..,',0&$',<',=,0&(&/?0=$1?$(0=:,'$
=)89$,==,0&/(6$-?1,66/07$B),=&/?0=$1/..,',0&6@+$0?&$:/&9?)&$8?0=,B),08,=$&?$&9,/'$<',8/=/?0+$,D<',==/E,0,==$(01$,8?0?-@$/0$-(/0&(0,08,"$
F?$7,&$($86,(','$E/,:$?.$.(8&=$(01$&?$,(=,$&9,$1/=8)==/?0+$&9,$()&9?'$1,</8&,1$($1/(7'(-$
?.$&9,$=8,0,'@$/0$:9/89$&9,$',(6/&@G-?1,6$1/=8)==/?0$&(>,=$<6(8,"$2=$&9,$:?'1=$.?'$&9,$1,*(&,$HI',(6/&@I$?'$I)0/E,'=(6I$,"7"$H$(',$?.&,0$)01,'=&??1$1/..,',0&6@$*@$&9,$1/E,'7,0&$
<?=/&/?0=+$&9,@$8(0$*,$7/E,0$1/..,',0&$<6(8,=$/0$&9,$1/(7'(-$(01$&9,$1/E,'7,08/,=$/0$E/,:$*,8?-,$-?',$,E/1,0&$(01$&9)=$,(=/,'$&?$*,$)01,'=&??1$.?'$&9,$1/E,'7,0&$<?=/&/?0="$F9,$
1/(7'(-$1?,=$J.?'$($=&('&K$0?&$=9?:$&9,$<,'=?0(6$E/,:$?.$&9,$()&9?'$(*?)&$&9,$',(6/&@G&?G-?1,6$',6(&/?0+$*)&$/=$L)=&$(0$,08?-<(==/07$6(01=8(<,$:9,',$&9,$1/E,'=,$<?=/&/?0=$8(0$*,$
,0&,',1"
$
M/7)',$NO$%8,0,'@$.?'$&9,$',(6/=-$1,*(&,
F9,$=8,0,'@$8(0$*,$/66)-/0(&,1$(88?'1/07$&?$&9,$1/..,',0&$<?=/&/?0="$M/7)',$P$=9?:=+$.?'$
,D(-<6,+$&9,$J,D&',-,K$<?=/&/?0$?.$=?6/<=/=-$(01$&9,$?0,$?.$&9,$I<6(&?0/(0I$',(6/=-O
M/7)',$PO$QD(-<6,=$?.$&9,$E/,:$?.$&9,$=8,0,'@$*@$=<,8/./8$<?=/&/?0=$
%"$NRP
%<(8,GF/-,$#,(6/&@
S*L,8&$P
S*L,8&$N
%)*L,8&
T/01
#,(6/&@$?)&=/1,$%<(8,GF/-,%<(8,GF/-,$#,(6/&@
S*L,8&$P
S*L,8&$N
%)*L,8&
T/01
%)*L,8&
T/01
#,(6/&@$?)&=/1,$%<(8,GF/-,
F9,$:?'61$,D/=&=$?06@$/0$-@$-/01"#,(6/&@$/=$(0$/66)=/?0"
QD&',-,$NO$%?6/<=/=-
%<(8,GF/-,$#,(6/&@
S*L,8&$P
S*L,8&$N
%)*L,8&
!"#$
#,(6/&@$?)&=/1,$%<(8,GF/-,
F9,$:?'61$,D/=&=$?06@$/0$-@$-/01"
#,(6/&@$/=$(0$/66)=/?0"
IU6(&?0/(0I$#,(6/=-
T(&&,'$?*L,8&=$(',$8',(&,1$*@$&9,$/1,(=$*,9/01$&9,-$J)0/E,'=(6=K"$S)'$%<(8,GF/-,$#,(6/&@$/=$L)=&$($=9(1?:$?.$&9,$-?',$',(6$/1,(6$:?'61"$! !"#$%&'()#(*("+%*&%,-
</
%<(8,GF/-,$#,(6/&@
S*L,8&$P
S*L,8&$N
%)*L,8&
T/01
#,(6/&@$?)&=/1,$
%<(8,GF/-,
V0/E,'=(6=
W1,(=
!"#$%&'()*#+%')#(),(-./)#0%(-1-%,(#123,#(%/-0(-(+#+34)#+%')#(),()#1%#5-($&((6#7-4)#-,#
"-8&')#9*#12)#"%//%:-,8#$/3((-$#;-):(#3')#(2%:,#3,5#$%+03')5#:-12#12)#2)/0#%"#12)#($),)'<#5-38'3+=
>##(%/-0(-(+>##,3?;)#')3/-(+
>##@0/31%,-3,@#')3/-(+>##@3'-(1%1)/-3,@#')3/-(+
>##,%+-,3/-(+>##$%,$)01&3/-(+
A231#12)#5-($&((-%,#-(#,%1#%&1531)5#%'#1'-;-3/*#.)$%+)(#%.;-%&(#:2),#:)#/%%4#31#12)#:3<(#
5-"")'),1#0%(-1-%,(#5)3/#:-12#&,-;)'(3/(6
B-8&')#C=#D,-;)'(3/(#3(#()),#.<#C#5-"")'),1################B-8&')#E=#A3/4-,8#3.%&1#%.F)$1(0%(-1-%,(
B-8&')#C#$%+03')(*#2%:#@0/31%,-3,@#')3/-(+#G'-821H*#@3'-(1%1)/-3,@#')3/-(+#G+-55/)H#3,5#
$%,$)01&3/-(+#G/)"1H#&,5)'(13,5#&,-;)'(3/(6#B%'#12)#3&12%'#-1#-(#%.;-%&(#1231#12)()#5-"")'),1#;-):(#/)35#1%#5-"")'),1/<#.&-/1#4,%:/)58)#')0')(),131-%,(#:-12#$%,()I&),$)(#
')83'5-,8#12)#&()"&/,)((#%"#12)-'#-+0/)+),131-%,#3,5#')(&/1(6#
A2)#"3$1*#1231#:)#!"#$#3.%&1#%.F)$1(#3+%,8#5-"")'),1#(&.F)$1(#$3,#.)#355)5#1%#12)#5-38'3+#G"-86#EH#"%'#"&'12)'#$/3'-"-$31-%,6
J%+03'-(%,(#3(#-,#B-8&')#C#'3-()#12)#I&)(1-%,(*#:2-$2#$%,()I&),$)(#)3$2#0%(-1-%,#23(*#
:2-$2#%,)#-(#(&0)'-%'#G"%'#3#8-;),#0&'0%()H#3,5#2%:#:)#%&'()/;)(#(2%&/5#23,5/)#&,-;)'(3/(#"%'#%&'#%:,#')0')(),131-%,(6
A%#$/3'-"<#12-(#0%-,1#:)#/%%4#31#5-"")'),1#4-,5(#%"#&,-;)'(3/(#3,5#3(4#2%:#12)-'#0%(-1-%,#-,#
12)#($),)'<#-(6#J%&/5#-1#.)#1231#5-"")'),1#4-,5(#%"#&,-;)'(3/(#$%&/5#.)#0/3$)5#3,5#5)3/1#:-12#5-"")'),1/<K
A2)#"%//%:-,8#4-,5(#%"#&,-;)'(3/(#3')#0/3$)5#-,#12)#5-38'3+#3,5#$%+03')5=
>#L&+.)'(#3,5#%.F)$1(#%"#3.(1'3$1#/%8-$>#7-;-,8#(<(1)+(*#(&$2#3(#3,-+3/#(0)$-)(
>#B-$1-1-%&(#%.F)$1(>#L%,>+31)'-3/#$%,$)01(#G(&$2#3(#+)5-$3/#5-38,%()(H
A2)#3&12%'#0')(),1(#2-(#;-):#:-12#)M3+0/)(#-,#12)#($),)'<#5-38'3+#3,5#$/3-+(#1231#3#
5-(1-,$1-%,#.)1:)),#12)#5-"")'),1#&,-;)'(3/#8'%&0(#+34)(#(),()6#N,#5%-,8#(%*#)/)+),1(#%"#12)#@0/31%,-3,@*#@3'-(1%1)/-3,@#3,5#12)#$%,$)01&3/-(1#0%(-1-%,#3')#$%+.-,)5#1%#"%'+#3#
-,1)8'31-;)#;-):#%"#+%5)//-,8#G-,1)'0')131-%,#%"#')3/-1<H6
O6#9P9
O03$)>A-+)#Q)3/-1<
!.F)$1#9
!.F)$1#R
Q)3/-1<#%&1(-5)#O03$)>A-+)
O&.F)$1
S-,5
O&.F)$1
S-,5
O&.F)$1
S-,5
S%5)/
#D,-;)'(3/(#
O03$)>A-+)#Q)3/-1<
!.F)$1#9
!.F)$1#R
O&.F)$1
S-,5
O&.F)$1
S-,5
Q)3/-1<#%&1(-5)#O03$)>A-+)
D
&
DD,-;)'(3/(
A2)#O)+-%1-$#A'-3,8/)#/-,4(#$%,$)01(#-,#+-,5(#:-12#(<+.%/(#-,#+%5)/(#3,5#:-12#12)#')0')(),1)5#%.F)$1(6
T2)')#3')#D,-;)'(3/(K# ! !&1(-5)#O03$)>A-+)# ! @U/31%,-3,@! U3'1#%"#12)#!.F)$1(# ! @V'-(1%1)/-3,@! N,#12)#S-,5#%"#W&+3,(##! @J%,$)01&3/-(1@
How should parthood relations be expressed in
SNOMED CT?
Franz Baader,1 Stefan Schulz,2 Kent Spackman,3 and Boontawee Suntisrivaraporn4
1TU Dresden, Germany, [email protected] University Hospital, Germany, [email protected]
3International Health Terminology Standards Development Organisation, USA, [email protected], Thammasat University, Thailand, [email protected]
The Systematized Nomenclature of Medicine, Clinical Terms (SNOMED CT)1 is a clinicalterminology with a broad coverage of health care, which has been developed with the helpof a rather inexpressive description logic dialect known as EL [1]. The advantage of using adescription logic (DL) for defining a medical ontology is that, instead of error-prone “hierarchyengineering,” where each newly introduced concept needs to be manually positioned at theright place in the concept hierarchy, one adds a definition of the new concept to the knowledgebase and the DL reasoner then automatically finds the right position of this concept in theconcept hierarchy. The advantage of using an inexpressive DL is that classification (i.e., thecomputation of the concept hierarchy) is fast even for a very large ontology like SNOMED CT.E!cient reasoners for EL, like SnorocketTM,2 which is based on the classification algorithmintroduced in [2], can classify SNOMED CT in less than a minute.
The disadvantage of using an inexpressive DL is that not all relevant properties can be explicitlyexpressed. In particular, EL does not allow to state that relations such as part-of are transitive,and consequently the reasoner cannot take transitivity into account during classification. Inorder to overcome such limitations in DLs without transitive relations, the SEP-triplet encodingwas proposed in [3]. An SEP-triple for the concept A is actually composed of three concepts:the structure AS , the entity A, and the part AP . Intuitively, the E-concept is supposed to beinstantiated by entire anatomical objects (such as my hand), the P-concept by the proper partsof the referred objects (such as any part of my hand), and the S-concept by both entire objectsand their parts. Fig. 1 gives an example of how a correct use of the SEP-triplet encoding shouldlook like. It is easy to see that transitivity of the part-of relation can be simulated throughthe intra-triple part-of relationships and the intrinsic transitivity of (both intra- and inter-triple) subsumption relationships. In fact, in the example of Fig. 1, the DL reasoner is ableto infer that the finger is part of the upper limb since we have Finger ! FingerS ! HandP !
HandS ! UpperLimbP ! "part-of.UpperLimb. Since characteristics are inherited along the is-ahierarchy, the SEP-triplet encoding also allows us to simulate inheritance of characteristicsalong the part-of hierarchy. In our example, by connecting an injury via a location link to theS-concept, we can ensure that ‘injury to finger’ is classified as ‘injury to hand’ and ‘injuryto upper limb’. To suppress such inheritance along the part-of hierarchy (e.g., ‘amputation offinger’ should not be classified as ‘amputation of hand’ or ‘amputation of upper limb’), oneneeds to connect via location to the E-concept. There are, however, several problems with theSEP-triplet encoding. On the one hand, the SEP-triplet approach is error prone since it workscorrectly only if it is employed with a very strict modelling discipline. For instance, incorrectlinks to the S-concept rather than the E-concept may result in unintended consequences likethe classification of ‘amputation of finger’ as a subconcept of ‘amputation of upper limb’. Onthe other hand, the approach introduces for every proper concept in the ontology two auxiliaryconcepts, which results in a drastic increase in the ontology size, and thus in the time neededfor classification.
1 http://www.ihtsdo.org/snomed-ct/2 http://aehrc.com/hie/snorocket.html
Finger
FingerS
FingerP
Hand
HandS
HandP
UpperLimb
UpperLimbS
UpperLimbP
AmputationOfFinger
InjuryToFinger
AmputationOfHand
InjuryToHand
AmputationOfUpperLimb
InjuryToUpperLimb
Fig. 1. Example of a correct use of the SEP-triplet encoding. The solid edges denote subsumption(IS-A), the dashed edges part-of, and the dotted edges has-location relationships.
To avoid these problems, we have proposed in [4] to use the more expressive DL EL++ [5, 6],for which classification can still be done in polynomial time. The complex role inclusion axiomsavailable in EL++ can be used to state reflexivity and transitivity of roles like part-of, subrolerelationships (e.g., between proper-part-of and part-of), and right-identity rules (which can,e.g., be used to express the inheritance of characteristics along the part-of relation). To avoidunintended inheritance of characteristics (e.g., in the case of amputation), we use two distinctrelations: has-location, which is inherited from a part to its whole, and has-exact-location, a sub-relation of has-location, which is not inherited that way. Fig. 2 shows the re-engineered ontologyobtained this way from the knowledge base of Fig. 1.
This new modelling approach avoids the introduction of the two additional auxiliary concepts(the S-concept and the P-concept) for every anatomical concept. The experiments reported in[4] show that this actually speeds up the time needed for classification. However, for backwardcompatibility, it would be nice to be able to define the S-concept and/or the P-concept incase it is needed (e.g., since it is used directly in other parts of the ontology). According tothe underlying intuition, this should be easy: these concepts can be pre-coordinated as fullydefined concepts, as illustrated here for the concept hand: HandP ! "proper-part-of.Hand andHandS ! "part-of.Hand.
Finger ! BodyPart" #proper-part-of.Hand (1)
Hand ! BodyPart" #proper-part-of.UpperLimb (2)
UpperLimb ! BodyPart (3)
AmputationOfFinger $ Amputation " #has-exact-location.Finger (4)
AmputationOfHand $ Amputation " #has-exact-location.Hand (5)
AmputationOfUpperLimb $ Amputation " #has-exact-location.UpperLimb (6)
InjuryToFinger $ Injury " #has-location.Finger (7)
InjuryToHand $ Injury " #has-location.Hand (8)
InjuryToUpperLimb $ Injury " #has-location.UpperLimb (9)
proper-part-of % proper-part-of ! proper-part-of (10)
proper-part-of ! part-of (11)
part-of % part-of ! part-of (12)
! ! part-of (13)
has-exact-location ! has-location (14)
has-location % proper-part-of ! has-location (15)
Fig. 2. The re-engineered version of the knowledge base in Fig. 1, now without SEP-triplets.
Unfortunately, this solution (which was already proposed in [4]) is not completely satisfactorysince not all subsumption relationships for the auxiliary concepts that follow from the SEP-encoded version of the knowledge base (Fig. 1) follow from the re-engineered version (Fig. 2)extended by the definitions for the S- and P-concepts for Finger, Hand, and UpperLimb. Forexample, in Fig. 1 we have the (stated) subsumption relationship FingerS ! HandP . Usingthe complex role inclusion axioms in Fig. 2 together with the definitions for the auxiliaryconcepts, we can only conclude "part-of.Finger ! "part-of.Hand (i.e., FingerS ! HandS), butnot "part-of.Finger ! "proper-part-of.Hand (i.e., not FingerS ! HandP ). In order to obtain thesecond subsumption, we would need to add the complex role inclusion
part-of # proper-part-of ! proper-part-of.
Interestingly, this left-identity rule, together with proper-part-of ! part-of, creates a so-calledcycle over role inclusions, which is not allowed in the DL SROIQ underlying the new versionof the Web Ontology Language, OWL2. Consequently, OWL2 compliant reasoners (like FaCTand Pellet) would not accept this extended knowledge base as an input. Fortunately, such acyclic dependency is allowed in EL++ and can be processed by our reasoner CEL.3 Recently,Kazakov [7] was able to design a decidable extension of SROIQ that can also express theextended knowledge base.
To sum up, we have recalled the re-engineering of SNOMED CT as proposed in [4], and haveshown that a backward compatible version, which also contains definitions for the auxiliaryS- and P-concepts, requires an additional complex role inclusion that destroys the acyclicityproperty of the set of complex role inclusion. For this reason, the backward compatible re-engineered version of SNOMED CT is not expressible in OWL 2, but it is expressible in EL++
and an appropriate extension of SROIQ.
References
1. F. Baader. Terminological cycles in a Description Logic with existential restrictions. In GeorgGottlob and Toby Walsh, editors, Proceedings of the 18th International Joint Conference on ArtificialIntelligence, pages 325–330. Morgan Kaufmann, 2003.
2. F. Baader, C. Lutz, and B. Suntisrivaraporn. Is tractable reasoning in extensions of the DescriptionLogic EL useful in practice? In Proceedings of the 2005 International Workshop on Methods forModalities (M4M-05), 2005.
3. S. Schulz, M. Romacker, and U. Hahn. Part-whole reasoning in medical ontologies revisited: Intro-ducing SEP triplets into classification-based Description Logics. Journal of the American MedicalInformatics Association (JAMIA), pages 830–834, 1998. Section VIII Standards and Policies - Issuesin Knowledge Representation.
4. B. Suntisrivaraporn, F. Baader, S. Schulz, and K. Spackman. Replacing SEP-triplets in Snomed ct
using tractable Description Logic operators. In Jim Hunter Riccardo Bellazzi, Ameen Abu-Hanna,editor, Proceedings of the 11th Conference on Artificial Intelligence in Medicine (AIME’07), volume4594 of Lecture Notes in Computer Science, pages 287–291. Springer-Verlag, 2007.
5. F. Baader, S. Brandt, and C. Lutz. Pushing the EL envelope. In Proceedings of the 19th Interna-tional Conference on Artificial Intelligence (IJCAI-05), Edinburgh, UK, 2005. Morgan-KaufmannPublishers.
6. F. Baader, S. Brandt, and C. Lutz. Pushing the EL envelope further. In Kendall Clark and Peter F.Patel-Schneider, editors, In Proceedings of the OWLED 2008 DC Workshop on OWL: Experiencesand Directions, 2008.
7. Y. Kazakov. An extension of regularity conditions for complex role inclusion axioms. In Proceedingsof the 2009 International Workshop on Description Logics (DL’09), 2009.
3 http://lat.inf.tu-dresden.de/systems/cel/
Ludger Jansen, Institut für Philosophie, Universität Rostock
Constitution relations in biological cells
Abstract für den 1. Workshop des GI-Arbeitskreises OBML
Research about living cells faces many problems. Among these are the most prominent: the
complexity of cell systems, the plurality of interacting levels and the stochastic nature of
many cell processes (Wolkenhauer/Muir, forthcoming). These problems find their analogies
when it comes to representing our knowledge about biological cells. The complex and
dynamic cell system as well as intra- and intercellular communication have to find simplified
and static representations. Because of the huge amounts of data that are being collected in
genome sequencing, animal models and other studies, the knowledge gained can only be
stored and made available by computer-based means. Formal ontology is a promising method
to get a grip on problems occurring with the coding of facts and with information retrieval and
there are first implementations for the domain of cells. The largest project is certainly the
Gene Ontology (GO) with its three parts concerning molecule functions, biological processes
and cell components (http://www.geneontology.org). Other relevant projects are the Cell Type
Ontology (Bard, Rhee and Ashburner 2005) that is, like GO, candidate ontologies of the Open
Biomedical Ontologies Foundry (http://obofoundry.org; Smith et al. 2007), and, because of its
application area, the National Cancer Institute Thesaurus (NCIT).
To analyse a complex system into simpler parts, many ontological distinctions are relevant.
For computer-based representations of knowledge about cells we need a classification of
continuants like cells and cell parts (Jansen 2008) as well as a classification of occurrents like
interaction processes between cells or cell parts and event types like Cell_division (Henning
2008, Schulz/Jansen 2009). The different levels of causal interaction in cells have to be
reflected by analysing several granular partitions of cells, like the molecular level, the level of
functional cell parts (organelles) and the level of the cell itself, as well as the combinations of
material and functional descriptions and the representation of mereological and topological
relations like Cell has_part Nucleus, Cell part_of Cell or Nucleus contains DNA.
A particular problem arises for cell ontology through the fact that some particulars seem to
belong to several of these levels: Unicellular organisms seem to be at once a single cell and an
organism, and strings of DNA seem to be at the same time molecules and functional cell
parts. These coincidences on the level of particulars threaten the generality necessary for
partonomic statements on the level of universals. This problem can be solved within standard
mereology by the help of the unity relation of material constitution (Baker 2000, 2007; cf.
also the papers in Rea 1997). While the unity relation of identity is an equivalent relation, the
constitution relation is irreflexive and asymmetric. (Baker’s definition at least implies the
transitivity of the constitution relation; cf. Zimmerman 2002.) The result is that particulars
belonging to different levels of partition – molecules, organelles, cells, organisms – are, first
impression notwithstanding, not identical, but constitute each other. Given this result, an
unambiguous assignment of particulars to these levels of partition is guaranteed. Secondly,
the generality of partonomic statements is no longer threatened by these special cases.
Thirdly, these levels of partition and the entities belonging to them can be characterized by
their causal and explanatory features, that manifest themselves especially in the ascriptions of functions which entities have in different contexts – be it within cells, within organisms, or within an ecosystem.
Acknowledgements Research for this paper has been conducted under the auspices of the research cluster „Transformation wissenschaftlichen Wissens in den Lebenswissenschaften: Das Verständnis der lebenden Zelle im Wandel“ and has been supported by a grant of the Exzellenz-Förderprogramm Mecklenburg Vorpommern (EFP M-V). References
Baker, Lynne Rudder (2000), Persons and Bodies. A Constitution View, Cambridge:
Cambridge University Press. Baker, Lynne Rudder (2007), The Metaphysics of Everyday Life, Cambridge: Cambridge
University Press. Bard, Jonathan/Rhee, Seung Y./Ashburner, Michael (2005), An Ontology for Cell Types, in:
Genome Biology 6 (2), R21, http://genomebiology.com/2005/6/2/R21. Bittner, Thomas/Smith, Barry (2003), A Theory of Granular Partitions, in: Duckhamm/Good-
child/Worboys, Foundations of Geographic Information Science, London, 117-151. Hennig, Boris (2008), Zeitliche Entitäten: Geschehnisse, in: Jansen/Smith 2008, 127-154;
engl. Übers.: „Occurrents“, erscheint in: Munn/Smith 2008, 255-284. Jansen, Ludger (2007), Tendencies and other Realizables in Medical Information Sciences, in:
The Monist 90, 534-555. Jansen, Ludger (2008), Klassifikationen, in: Jansen/Smith 2008, 67-83; engl. translation:
Classifications, in: Munn/Smith 2008, 159-172. Jansen, Ludger/Smith, Barry, Hgg. (2008), Biomedizinische Ontologie. Wissen repräsentieren
für den Informatik-Einsatz, Zürich: vdf. Munn, Katherine/Smith, Barry (Hgg.), Applied Ontology, Frankfurt/Lancaster: Ontos,
erscheint 2008. Rea, Michael C. (1997), Material Constitution. A Reader, Lanham et al.: Rowman &
Littlefield. Schulz, Stefan/Jansen, Ludger (2009), Molecular Interactions: On the Ambiguity of Ordinary
Statements in Biomedical Literature, in: Applied Ontology 4, 21-34. Smith, Barry et al. (2007), „The OBO Foundry: Coordinated Evolution of Ontologies to
Support Biomedical Data Integration“, in: Nature Biotechnology 25, 1251–1255. Wolkenhauer, Olaf/Muir, Allan (forthcoming), The Complexity of Cell Biological Systems,
in: Clifford Hooker, John Collier (Hgg.), Philosophy of Complexity, Chaos and Non-Linearity (= Handbook of the Philosophy of Science, ed. Dov Gabbay, Paul Thagard and John Woods, vol. 16), Amsterdam: Elsevier.
Zimmerman, Dean (2002), Persons and Bodies: Constitution Without Mereology?, in: Philosophy and Phenomenological Research 64, 599-606.
!"#$%&'()"*&#+($#,(-.#/%.#.0.#&'(12$3.#&#+(/(45&#'&*623(7.5($(82%$#0&'(9#025*52025
!"#$"#%&'()*+#,"#-).//0%.12034.'#56+#7'.)8/039.3+#%:;<.08
!"#$%&'()*+&,(%
=;.#%.12034.'#:>4039#?@?&.1#()&>1(&0:(//@#(??093?#AB-CDE#:>4.?#&>#3>)3#F;'(?.?#
'.F'.?.3&039#1.40:(/#40(93>?.?"#G>'#&;0?#F)'F>?.#&;.#3>)3#F;'(?.?#('.#?.1(3&0:(//@#(3(/@?.4#(34#*'>)9;&>#(3#03&.'3(/#2>'1(&"#A3#&;0?#F'.?.3&(&0>3#<.#<(3&#&>#.HF/(03#&;.#
*(?0:#F'03:0F/.?#>2#>)'#?@?&.1"
-"#./,0(1(2/,+30#3%)#1,&*3&,(%30#431,1
I.#4>3J&#*./0.K.#&;(>4./?#:(3#.K.'#*.#(#:>F@#>2#'.(/0&@"#5#'.F'.?.3&(&0>3#>2#(#4>1(03#:(3#>3/@#:>3&(03#(#103)&.#2'(:&0>3#>2#&;.#032>'1(&0>3#&;.#4>1(03#;>/4?#03#'.(/0&@#L=;0?#0?#
3>&#(#M).?&0>3#>2#9'(3)/('0&@N#(::>'4039#&>#&;.#F.'?F.:&0K.#:.'&(03#(?F.:&?#>2#&;.#4>1(03#('.#03#&;.#2>'.9'>)34#(34#>&;.'#03#&;.#*(:O9'>)34P"#Q>#1>4./#0?#:>1F/.&."#,>4.//039#0?#
(/<(@?#;.)'0?&0:#(34#(3#)/&01(&.#1>4./#>2#(#4>1(03#4>.?#3>&#.H0?&"#,>4./?#('.#!"#$%&%$#'#!(")*>2#'.(/0&@"#-022.'.3>4./?#:(3#:>.H0?&"#
=;.#/(?&#?&(&.1.3�?#.K04.3>'#>)'#&(?ON#=;.#/>90:#>2#&;.#AB-CDE#0?#3>&#:>;.'.3&#(34#
:(3#4022.'#2'>1#&;.#/>90:#>2#&;.#F;@?0:0(3?"#I.#1)?&#(::.F&#&;(&#/>90:?#4022.'#(34#<.#1)?&#*.#(*/.#&>#(4R)?&#&>#K('0>)?#?@?&.1?"#I.#2)'&;.'1>'.#1)?&#*.#(*/.#&>#&'(3?/(&.#2'>1#>3.#
?@?&.1#&>#&;.#>&;.'"
A3#&;.#2>//><039#<.#.HF>?.#&;.#:;('(:&.'0?&0:?#>2#>)'#?@?&.1N
5"#6(%+72'+/,&7+&*'7
C#B>1F>?0&.#:>3:.F&#'.F'.?.3&(&0>3#L:>3:.F&#:/)?&.'?PC#%)F'.1(:@#>2#F>?&:>>'403(&0>3#!#&;.#:>1F>?0&.#:/)?&.'#2>'1#>2#:>3:.F&?#(/<(@?#
F.'?0?&?#!#3>#03?&(3&0(&0>3?#&>#(&>10:#>*R.:&?C#S?.)4>&'..?N#&;.#:>1F>?0&.#:/)?&.'?#*)0/4#F?.)4>&'..?+#<;0:;#(//><#.(?@#'.(4039#*@#
;)1(3?#(?#<.//#(?#*@#1(:;03.?#!#F.'2>'1(3:.C ,)/&0401.3?0>3(/#(34#1)/&02>:(/#>'9(30?(&0>3#>2#&;.#?.1(3&0:#?F(:.#
C#T02(:0(/0&@#L*02(:0(/#&@F039P#<0&;#:;(03#*)0/4039#!#>3/@#>3.#2>'1(/#:>3:.F&#./.1.3&C#5*(34>31.3&#>2#3(1.4#'./(&>'?+#'.?&'0:&0>3#&>#&<>#)33(1.4+#1.'./@#2>'1(/#'./(&>'?
C#=;.#L2>'1(/P#=UV#0?#&;.'.2>'.#'.?&'0:&.4#&>#W#./.1.3&?N#D#L3(1.4P#:>3:.F&#(34#X#L)33(1.4P#'./(&>'?
C#=;.#&<>#'./(&>'?#?F(3#&;.#?.1(3&0:#?F(:.#N#V3.#/03O?#:>3:.F&?#03?04.#&;.#401.3?0>3?+#&;.#>&;.'#O30&?#&;.#401.3?0>3?
8"#9*071#3%)#,%:7'7%+7#2'(+711
C#$)/.?#(34#?&(&)?#('.#2>'1.4#(/0O.N#:>1F>?0&.+#1)/&02>:(/#.&:"#L?..#(*>K.P
C#$.4):.4#?.&#>2#>F.'(&>'?C#Q>#4.2030&0>3?+#R)?&#')/.?#L<;0:;#:(3#*.#>K.'')/.4P
C Q>3C1>3>&>30:#/>90:C -@3(10:#&@F039
C#5/9>'0&;1?#<>'O#<0&;#:>1F>?0&.#:>3:.F&?#L3>&#(&>10:#>3.?PC#B;'>3>/>9@#:>3&'>/#*@#03F)&#(34#*@#4@3(10:#&'099.'?#03#&;.#?&(&)?
Abstract
Formal Semantics and Ontologies
Frank Loebe
Department of Computer Science and
Institute of Medical Informatics, Statistics and Epidemiology
University of Leipzig
The distribution and adoption of ontologies is remarkably increasing in various areas. In
particular, this applies in the context of the Semantic Web, and it is also the case regarding
applications in biology, medicine, and life sciences. This development has lead to a strong
emphasis on the formal representation of ontologies in dedicated ontology languages like the
Web Ontology Language (OWL). However, the same development runs the risk of neglecting
the original role of ontologies in computer and information sciences, namely to provide
systems of unambiguous semantic references. In this connection, ontologies share
motivational aspects with terminological systems and controlled vocabularies. It is this role of
ontologies that justifies their potential of enabling semantic interoperability.
In this regard, two interpretations of the term “semantics” should be clearly separated, but are
sometimes intermingled. Given an ontology (as a representation artifact), the first
interpretation of “semantics” refers to the conceptual or intensional semantics, i.e., to the very
categories / concepts (and their interdependences) that are represented by an ontology. If the
ontology is represented in a formal language, for instance, in a description logic language, the
resulting representation has an additional, formal semantics, due to and determined by the
formal semantics of the language. We argue that the practical use and view of ontologies in
biomedicine and life sciences is still more attuned to the first type of semantics, whereas in
the Semantic Web, ontology technology tends to become a general computational model that
is released from the original task of ontologies of offering conceptual foundation. This
difference and the kind of problems resulting from it can be demonstrated by recent attempts
to ground bio-ontologies or related formats like the one of the Open Biomedial Ontologies
(OBO) in Semantic Web languages. In particular, the position is defended that ontology
representation in a formal language results in an encoding of ontological relationships into
formal semantic entities that is to some extent arbitrary, but that is documented only in rare
cases and thus hinders justified ontology translation among different languages and formats.
Beyond the biomedical context and prior to improving the situation just described, we see a
need for advancing the theoretical basis of ontologies in computer and information sciences in
order to account for, at the same time, a formal semantic approach and the actual ontological
claims / commitments that are to be captured in and provided for by ontologies. For instance,
it can be argued that the current notion of ontology-based semantic integration fails for
ontologies themselves. In this connection, we outline a novel kind of semantics, called
ontological semantics. It is constructed in analogy to the well-known Tarski-style model
theory of first order logic (and description logics), but tries to avoid ontological commitments,
e.g., to a particular set theory, in its general form. Nevertheless, adopting a basic ontology is
beneficial for a more formal definition of a specific semantics of this kind. The main purpose
of ontological semantics is to serve as a background theory for a revised notion of intensional
semantic equivalence. For practical purposes, we advocate approximations in terms of well-
established formal languages, in order to leverage existing tools and technology.
Neglected tropical diseases - A challenge to biomedical ontology
engineering
Fred Freitas
Informatics Center - Federal University of Pernambuco, Brazil
Knowledge Representation and Knowledge Management Research Group
University of Mannheim, Germany
Stefan Schulz
Institut für Medizinische Biometrie und Medizinische InformatikUniversitätsklinikum Freiburg, Germany
The huge amount of data being collected in biomedical research, health care and public health requires
increasingly sophisticated representational formalisms. In the past, data had been structured by
classification systems and thesauri (e.g., the ICD and the MeSH). More recently, the need for a real
semantic interoperabibility has been formulated and is now being addressed by ontologies or
ontologically oriented terminology systems such as the OBO (Open biomedical ontologies) and
SNOMED CT, capitalizing on Semantic Web technologies.
We are developing a prototype ontology for the field of neglected tropical diseases, starting with
Leishmaniose as a disease of considerable public health impacts in many tropical countries. This is an
area that have largely been bypassed by past terminology and ontology endeavours although it
encompasses interesting challenges.
The project has novel aspects both from a domain and a computer science perspective. The domain
representation perspective focuses on the linkage among population-specific, clinical, lab data, and
scientific publications. The conceptual space of a global account of many tropical diseases as it includes
the following aspects:
- individual disease vs. affected populations
- treatment and prevention both related to individuals and populations
- biological organisms that play different roles: hosts, vectors, pathogens
- complicated organisms with different relevant lifecycles
- broad spectrum of disease manifestations
- importance of the natural division of geographic environments,
- socioeconomic factors, housing, mobility
- public health authorities
- administrative divisions of geographic space
The computer science prospect addresses the linkage of existing ontologies with large repositories of
organisms and geographic entities on the one, hand, and with existing ontologies (SNOMED, OBI, CO,
GO, …) on the other hand. Especially the integration of geographic data in an ontology gives rise to new
research questions, as it integrates large amounts of instance (A-Box) data with class (T-Box) data.
Development aspects encompass the assembly of the ontology based on diverse existing sources, the
linkage of different modules as well as import routines from external repositories. They include the
creation of a Web-based ontology browser and a Web-based retrieval interface, re-using existing
applications. One difficulty will lie in the fact that a language gap must be bridged: The ontology labels
are mostly in English (only partially extended by Spanish and Portuguese labels available from
multilingual terminologies), whereas many resources are only available in Portuguese. The solution is
here to re-use and adapt an existing cross-language indexing system.
An important research aspect is to provide evidence that a useful domain ontology can be created with
limited resources largely based on existing material, using Semantic Web standards. To this end, a set of
test queries is under elaboration and a gold standard is being produced by manual relevance judgments.
Thus several document and fact retrieval scenarios can be evaluated and the usefulness of the ontology
can be assessed.
We have devised three use cases to be developed in medium term:
• Decision support systems (DSS) for neglected diseases – this was defined as the first goal of the
project and its main integration point with OTICSSS, an emerging health information integration
initiative in Brazil. The main objective here is to develop an ontology-based information
integration system in order to query heterogeneous neglected disease-related databases from
different governmental sources (county, state and country). Such a cooperation will certainly
allow both projects to contribute to each other, particularly if both take alternative research
directions to tackle the problem.
• A search engine for biomedical documents. The Freiburg University spin-off Averbis GmbH
(www.averbis.de) is developing tools and resources that can be useful for building a new,
specific, cross-lingual retrieval environment, by mediating between documents and ontology
descriptions using morphosemantic abstraction performed by these tools.
• Intelligent agents that assist diagnosis and prognosis of diseases under scrutiny in potential and
actual patients. This can be a breakthrough in the project, and a good usage testbed for the
ontologies to be constructed.
!
!"#$%&&'()%*(+,$+-$%,$+,*+'+./$0#1(.,$&%**#2,$-+2$
-3,)*(+,%'$%4,+25%'(*(#1$*+$&"#,+*/&#$+,*+'+.(#1$%,0$
*"#$#6*2%)*(+,$+-$%,$+,*+'+./$+-$%,%*+5()%'$-3,)*(+,1$
!"#$%&'("$)*+"%,!"#$$%&'%()*%+,-./0!
-.$/012%3//$'45"*56'45"7"!"1/-1/*)213-.+*42(,'125#$2672/,8$0!
86*$&'9$/:"!"($#9-)$:*,+6/,8$0!
!"#$%&'$(
;'1%42-19!6#*<! *1! 2+6-.4*14! .-#$! 4&.-'/&-'4!=2-#-/<,!>#4&-'/&!+-#$%'#*.! 3'1%42-19!*.$! %-:$.$8!21!
4&$! ?$1$! @14-#-/<A! 4&$.$! 29! %'..$14#<! 1-! 6'=#2%#<! *:*2#*=#$! -14-#-/<! -3! *1*4-+2%*#! 3'1%42-19,!
@14-#-/2%*#! %-1928$.*42-19! -1! 4&$! 1*4'.$! -3! 3'1%42-1*#! *=1-.+*#242$9! *18! 4&$2.! .$6.$9$14*42-1! 21!
%'..$14! 6&$1-4<6$! -14-#-/2$9! 9&-B! 4&*4! B$! %*1! *'4-+*42%*##<! $C4.*%4! *! 9($#$4-1! 3-.! 9'%&! *1!
-14-#-/<! -3! *1*4-+2%*#! 3'1%42-19! =<! '921/! *! %-+=21*42-1! -3! 6.-%$99A! 6&$1-4<6$! *18! *1*4-+<!
-14-#-/2$9,!!
D$!6.-:28$!*1!-14-#-/2%*#! *1*#<929!-3! 4&$!1*4'.$!-3! 3'1%42-19!*18! 3'1%42-1*#! *=1-.+*#242$9! 4&*4! 29!
%-+6*42=#$!B24&!4&$!+-94!%'..$14!:2$B9!-1!4&$!-14-#-/2%*#!8$321242-1!-3!3'1%42-19,!;.-+!4&29!*1*#<929A!
B$!8$.2:$!*1!*66.-*%&!4-!4&$!*'4-+*42%!$C4.*%42-1!-3!*1*4-+2%*#!3'1%42-19!3.-+!$C29421/!-14-#-/2$9!
'921/! *! %-+=21*42-1! -3! 1*4'.*#! #*1/'*/$! 6.-%$9921/A! /.*6&5=*9$8! *1*#<929! -3! 4&$! -14-#-/2$9! *18!
3-.+*#! 213$.$1%$9,! ! D$! *66#<! -'.! *66.-*%&! 4-! 4&$! E'+*1! F&$1-4<6$! @14-#-/<! *18! 4&$! G-'9$!
F&$1-4<6$! @14-#-/<! =<! '921/! 4&$! ;-'18*42-1*#! G-8$#! -3! >1*4-+<! *18! 4&$!G-'9$! >1*4-+<! *9!
=*%(/.-'18!-14-#-/2$9!*18!6.$9$14!6.$#2+21*.<!.$9'#49,!!
;'.4&$.+-.$A!B$! 214.-8'%$! *! 1$B! .$#*42-1! 4-! .$#*4$!+*4$.2*#! -=H$%49! 4-! 6.-%$99$9! 4&*4! .$*#27$! 4&$!
3'1%42-1! -3! 4&$! -=H$%4! 4-! *:-28! *! 1$$8#$99! 8'6#2%*42-1! -3! 6.-%$99$9! *#.$*8<! 6.$9$14! 21! 4&$! ?$1$!
@14-#-/<! 21! *!1$B!-14-#-/<!-3! *1*4-+2%*#! 3'1%42-19,!D$!829%'99! 9$:$.*#! #2+24*42-19!-3! 4&$! %'..$14!
-14-#-/2$9! 4&*4! 942##! 1$$8! 4-! =$! *88.$99$8! 4-! $19'.$! *! %-19294$14! *18! %-+6#$4$! .$6.$9$14*42-1! -3!
*1*4-+2%*#!3'1%42-19!*18!3'1%42-1*#!*=1-.+*#242$9,!
!
The ChEBI ontology Nico Adams, Paula de Matos, Adriano Dekker, Marcus Ennis, Janna Hastings, Kenneth Haug,
Duncan Hull, Zara Josephs, Steve Turner and Christoph Steinbeck.
The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
Chemical Entities of Biological Interest (ChEBI) is a freely available database and
ontology of molecular entities and chemical concepts, which is manually annotated to
a high standard of quality, and is non-redundant, differentiating it from other publicly
available chemistry resources. It focuses specifically on those chemical entities which
are of interest to the life sciences community, including metabolites and drugs.
The ChEBI ontology includes a structure-based and a role-based classification. The
relationships that are used to link between entities in the ontology are
• is a, which is used to indicate subsumption, such as L-alanine residue
(CHEBI:46217) is a alanine residue (CHEBI:32441);
• has part, which is used to indicate composition, such as diclofenac sodium
(CHEBI:4509) has part diclofenac(1-) (CHEBI:48311);
• has role, which is used to link a molecular entity to a role which it may perform,
such as kanamycin A sulfate (CHEBI:6109) has role antibacterial drug
(CHEBI:36047).
• chemistry-specific relationships is enantiomer of and is tautomer of; is conjugate
[base / acid] of; is substituent group from; has parent hydride and has functional
parent.
Recently, ChEBI has incorporated the structures, synonyms and citations of ~440000
bioactive compounds from the ChEMBL (http://www.ebi.ac.uk/chembldb/) drug-
discovery dataset. However, these entities have not yet been classified into the
ontology. The size of the dataset implies that such classification will never be feasible
with the manual approach used heretofore in ChEBI. Thus, we are currently exploring
approaches for structure-based automatic classification which include
• the extraction of features from the chemical structure (using standard
cheminformatics techniques) and subsequent logic-based classification into
classes defined using those features;
• the analysis of the structures for similarity on the basis of cheminformatics
structural keys and the association of particular structural keys with classes.
A further challenge is the automatic association of bioactivity roles to the compounds
from this dataset. Bioactivity is annotated within the source database in textual
format, necessitating a text-mining approach for extraction.
References
de Matos, P., Alcántara, R., Dekker, A., Ennis, M., Hastings, J., Haug, K., Spiteri, I., Turner, S., and
Steinbeck, C. (2009). Chemical entities of biological interest: an update. Nucleic Acids Res. pages
gkp886+.
Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., Alcántara, R.,
Darsow, M., Guedj, M. and Ashburner, M. (2008) ChEBI: a database and ontology for chemical
entities of biological interest. Nucleic Acids Res. 36, D344–D350.
AN ONTOLOGY GENERATION PLUGIN FOR OBO-Edit
Wächter T. and M. Schroeder
Technische Universität Dresden Biotechnologisches Zentrum, Tatzberg 47-51,
01307 Dresden, Germany, Telephone: +49 (351) 463 40068
Introduction:
Developing ontologies is a labor intensive process. Comfortable
editors such as OBO-Edit, which is developed and maintained by the Gene
Ontology Consortium, are a prerequisite for a good ontology design.
Results:
To speed up the manual process we developed an OBO-Edit plug-in for
semi-automated ontology generation. The ontology generation suggests terms,
definitions and parent-child relationships based on text mining and natural
language processing (NLP) techniques. State of the art NLP is used to rank the
relevance of terms for the domain to be modeled. Previous studies showed that
term generation can improve the completeness of an ontology by suggesting up
to 89% good candidate terms in the top 50 ranked terms. Public resources such
as Wikipedia, full text articles and web sites are incorporated to generate
definitions, which follow the well defined structure “A is a B with property C” if
available. Based on the generation of good definitions it is possible to suggest
the likely parent of A, namely B.
Methods:
In a three step procedure the plug-in supports the creation of new
ontology terms from text. As textual resources free text or a query for PubMed
abstracts can be submitted. In a first step the terminology mentioned in the text is
retrieved and ranked according to its importance to the domain. Abbreviations
and lexical variants are recognized and terms similar to existing OBO terms are
indicated as displayed. The list of candidate terms can be searched and filtered
with regular expression patterns to focus on specific lexical aspects of interest. In
a second step definitions for terms are generated and presented to the curator.
The defined candidate terms, which are enriched with synonyms are in the final
step inserted into the ontology. We addressed the difficulty of finding the correct
position in a tree structure by using all information available to the plug-in. All
potential parent terms e.g. all terms of the Gene Ontology, are displayed as list
and are ranked higher, if they are (a) selected in OBOEdit, (b) have a certain
lexical overlap with the new candidate concept, (c) are contained in the specified
definition, or (d) evidence for an relationship could be found in any of the OBO
listed ontologies.
With the novel Ontology Generation plug-in for OBO-Edit 2, we contribute to the
community by increasing the tool support for the development and maintenance
of biomedical ontologies. We used this plug-in for the creation of the Go3R
Ontology. Go3R is the world-wide first search engine on alternative methods
building on new semantic technologies that use an expert-knowledge based
ontology to identify relevant documents.
Availability: The plug-in is available as part of the beta53 release of OBO-Edit.
References:
Sauer, U. G., Wächter, T., Grune, B., Doms, A., Alvers, M. R., Spielmann, H.,
and
Schroeder., M. (2009). Go3R - semantic internet search engine for alternative
methods to animal testing. ALTEX. (shared first author)
Winnenburg, R., Wächter, T., Plake, C., Andreas, D., and Schroeder, M. (2008).
Facts from text: Can text mining help to scale-up high-quality manual curation
of gene products with ontologies? Briefings in Bioinformatics.
Alexopoulou, D., Wächter, T., Pickersgill, L., Eyre, C., and Schroeder, M. (2007).
Terminologies for text-mining; an experiment in the lipoprotein metabolism
domain. BMC Bioinformatics.
!
Concurrent ontology building with Collaborative Protégé
Authors: Daniel Schober1,2, James Malone2, Robert Stevens3
Affiliations: 1Institute for Medical Biometry and Medical Informatics, University
Clinic, Freiburg, Germany, 2European Bioinformatics Institute, Cambridge, UK, 3Manchester School of Computer Sciences, Manchester, UK
Introduction
Although living in a world of greater international collaboration, the geographical
distribution of developers still makes collaborative development approaches for
ontology development difficult to realize. If collaboration is not built into tools,
reaching widespread community consensus and encoding according to some
shared plan is not easy to achieve. For this reason, tools are developed that
allow not only for distributed collaborative ontology creation and modification, but
for direct and topic-linked communication about all aspects of the engineering
process as well. To investigate this process and corresponding capabilities of the
new Collaborative Protégé 3 (CP) tool, the Ontology of Biomedical Investigations
(OBI) was enriched in an experiment ran as part of an OntoGenesis network
meeting (website: http://ontogenesis.ontonet.org/moin/NetworkMeeting7) at the
European Bioinformatics Institute (EBI).
We investigated the CPs plugins [1] ability to:
o Facilitate multiple concurrent edits of a single owl file from different
computers
o Track annotations associated with specific representational units
(RUs), e.g. on classes or properties
o Track annotations associated with actions of ontology change
(deletions, axiom edits and annotation edits)
o Support for discussion threads and instant messaging
communication between ontology developers (real time chat).
In this talk we present our observations and recommendations for CP based
upon this experience.
Method
Our methodology involved the following set of tasks:
o Familiarization of users with Collaborative Protégé 3.4, its GUI and
collaborative features.
o Ad hoc additions of attendee's own lists of devices (with possibility
of duplication).
o Controlled additions of devices from a list as provided by the
metabolomics standards initiative (MSI).
o Deployment of an 'Agent Provocateur' to assess the transparency
of the changes occurring to others. These conflicting and
deliberately incorrect edits were made during a specified period of
time known only to the session organizer.
o Controlling/restricting communication channels (notes, discussion
threads and chat) to evaluate CPs ability to facilitate communication
in distributed, collaborative development.
The communication and interaction of the participants with each other, directly or
through the tool, were tracked and analyzed. The Obi.owl file was populated with
new !device" classes from the domains of the OntoGenesis members and as
taken from a list provided by the Metabolomics Standard Initiative (http://msi-
ontology.sourceforge.net/). Detailed statistics on numbers and kinds of
annotations made during the sessions with tables, diagrams and further
discussions are available in a spreadsheet from the OntoGenesis website.
Initially, development occurred in a single group but this was then divided into
subgroups. Ad hoc additions were made which was followed by subgroups
adding classes from the provided MSI term list. The results were then reviewed
and commented by the other subgroups adding annotations/notes. Subsequently,
other communication channels were tested. First, chat only, then by voice only
and after that by chat and voice together. During the latter stages of this session,
the Agent Provocateur user was deployed.
Results
A realistic collaborative ontology building session was set up to test CPs
collaborative editing and communication features. Collaborative ontology building
was relatively trouble free and the tool also copes with complicated setups and is
flexible enough to allow for corresponding adjustments.
We highlight some unfulfilled requirements:
Editing functionalities:
The lack of a RU and module locking mechanism meant that others could alter
classes that have a logical impact on the class under current definition by another
user. A roll back function would aid in conflict resolution and would lead to safer
editing. Subscription and Notification were requested, where users subscribe to
certain areas of interest within the ontology and are then notified of any changes
that occur in those areas.
Annotations on RUs with entity notes:
For minor and trivial annotations providing an annotation type, subject heading
and value in an overly granular manner was perceived as overkill. Also the
change track captured in the project-linked ChAO knowledge base is sometimes
presented in an overly granular manner. Users would like the changes to be
described in a high level abstraction, rather than at a detailed granular level.
Communication:
Chats were requested to be linked with specific RUs and axioms to aid a more
immediate and direct conflict resolution. A closed 'retreat room' was desired as
well as a filter function on user names to enable to see only the chats of certain
people or on particular ontology modules. Integration of emoticons in text fields
would increase transmittance of pragmatic aspects of communications.
Planning:
Integrated voting functionalities allow users to vote on change issues. A
mechanism that changes the ontology based on vote outcomes would increase
development time and could be implemented using ChAO information and
formalized voting outcomes. Issue tracker functions were requested, i.e. a
scratch pad or todo list that can be worked through and 'checked', e.g. indicating
a proposed plan and what has been already realized at a certain time point.
Conclusion
Although some caveats persist, it became clear that the CP tool is now in an
advanced state and can be used in practice with sufficient stability and much can
be done with configuration to further optimize it. Our practice-driven requirement
and fault analysis provoked much feedback to the tool developers, and will be
valuable for the CP version of P4, which is in preparation. A paper collating all
results has been accepted for the ICBO 2009 Conference and will be published
in their Proceedings. We will continue to investigate CP in further ontogenesis
meetings and hence will gain further insights into the process of software guided
collaborative functionalities for ontology engineering, ensuring continuous
feedback to the CP developers.
References
[1] Noy N, Tudorache T, de Coronado S, Musen M, Developing biomedical
ontologies collaboratively. AMIA Annu Symp Proc. 2008, p. 520-4!
Mining ontology concepts from literature for automated gene annotation.
Conrad Plake and Rainer Winnenburg
Bioinformatics Group. Technische Universität Dresden. Tatzberg 47-51. 01307
Dresden, Germany
High-throughput screens such as microarrays and RNAi screens produce huge
amounts of data. They typically result in hundreds of genes, which are often
further explored and clustered via enriched GeneOntology terms. The strength of
such analyses is that they build on high-quality manual annotations provided with
the GeneOntology. However, the weakness is that annotations are restricted to
process, function and location and that they do not cover all known genes in
model organisms. GoGene addresses this weakness by complementing high-
quality manual annotation with high-throughput text mining extracting co-
occurrences of genes and ontology terms from literature. GoGene contains over
4 000 000 associations between genes and gene-related terms for 10 model
organisms extracted from more than 18 000 000 PubMed entries. It does not
cover only process, function and location of genes, but also biomedical
categories such as diseases, compounds, techniques and mutations. By bringing
it all together, GoGene provides the most recent and most complete facts about
genes and can rank them according to novelty and importance. GoGene accepts
keywords, gene lists, gene sequences and protein sequences as input and
supports search for genes in PubMed, EntrezGene and via BLAST. Since all
associations of genes to terms are supported by evidence in the literature, the
results are transparent and can be verified by the user.
Information gained from gene-mutation studies can reveal function and disease
implications. The automated retrieval and integration of information about protein
point mutations in combination with structure, domain and interaction data from
literature and databases promises to be a valuable approach to study structure-
function relationships in biomedical data sets. We developed a rule- and regular
expression-based protein point mutation retrieval pipeline for PubMed abstracts,
which shows an F-measure of 87% for the mutation retrieval task on a
benchmark dataset. In order to link mutations to their proteins, we utilize a named
entity recognition algorithm for the identification of gene names co-occurring in
the abstract, and establish links based on sequence checks. Vice versa, we
could show that gene recognition improved from 77% to 91% F-measure when
considering mutation information given in the text. To demonstrate practical
relevance, we utilize mutation information from text to evaluate a novel solvation
energy based model for the prediction of stabilizing regions in membrane
proteins. For five G protein-coupled receptors we identified 35 relevant single
mutations and associated phenotypes, of which none had been annotated in the
UniProt or PDB database. In 71% reported phenotypes were in compliance with
the model predictions, supporting a relation between mutations and stability
issues in membrane proteins. We present a reliable approach for the retrieval of
protein mutations from PubMed abstracts for any set of genes or proteins of
interest. We further demonstrate how amino acid substitution information from
text can be utilized for protein structure stability studies on the basis of a novel
energy model.
We provide online access to our automatically derived gene annotations with
ontology-aided browsing at: http://gopubmed.org/gogene.
References
[1] Plake et al.: GoGene: gene annotations in the fast lane. Nucleic Acids Res.,
37(Web Server issue), W300-4, 2009. Link:
http://nar.oxfordjournals.org/cgi/content/abstract/gkp429
[2] Winnenburg et al.: Improved mutation tagging with gene identifiers applied to
membrane protein stability prediction. BMC Bioinformatics 2009, 10(Suppl 8):S3.
Link: http://www.biomedcentral.com/1471-2105/10/S8/S3!
A Four-Level Translational Approach to Model Surgical Processes Goldstein D
1, Loebe F
2, Herre H
3, Neumuth T
1
1
Universität Leipzig, Innovation Center Computer Assisted Surgery (ICCAS), Leipzig, Germany
2 Universität Leipzig, Department of Computer Science 3
Universität Leipzig, Institute of Medical Informatics, Statistics and Epidemiology (IMISE),
Research Group Ontologies in Medicine and Life Sciences (OntoMed)
To specify surgical interventions in a precise and formal way is an essential requirement
for many applications in the field of surgery, including the instruction of trainees, quality
assessment and evaluation, as well as computer-assisted surgery (CAS). Presently, there
are various different approaches to modeling surgical procedures. However, these
different approaches have varying focuses, and are thus characterized by varying degrees
of granularity. Furthermore, they lack a common and agreed-upon conceptual foundation.
This greatly hinders the interoperability, comparability, and uniform interpretation of
process data. For scientific purposes, however, such a uniform foundation would be
beneficial, facilitating the acquisition and exchange of data, the interpretation and
transition of study results, and the conveying and adaptation of tools and methods.
Therefore, we propose a generic, formal framework for the specification of surgical
processes. In this workshop, our method will be presented in combination with its design
methodology. The design follows a four-level, translational approach and encompasses
an ontological foundation for the formal level of our approach.
The expressive power and the unifying capacity of the presented framework will be
shown. For this aim, the framework was applied to four different, already existing models
of surgical procedures. We will show, that the presented framework allows for a uniform
representation of process models arising from different techniques.
The four-level approach is designed to capture knowledge about the progression of
surgical interventions. It shows a consistent translation of natural language to a level that
is near implementation and supports different research fields, such as the evaluation of
surgical assist systems, optimization, and re-engineering of surgeries, and the use of
workflow management systems in the operating room.
!"#$%&'(&)*&+,&-&"./.0#")1#,)20(0./%)3,/0")4.%/-&-)
5/"-678,0-.0/")5&(&9)4":/)!;<)
=;-&)>"-.0.;.&)3&,%0")?=>3@)
)
30#A&'0B/%)0"1#,A/.0#")-C-.&A-)/,&)'&D&%#+&')0")#,'&,).#)"/D0(/.&9)-&/,B89)-.,;B.;,/%%C),&+,&-&".9)
/"')(,/+80B/%%C)'0-+%/C)E0#%#(0B/%)-C-.&A-)?F@G)3C)A&/"-)#1)E0#A&'0B/%)0"1#,A/.0#")-C-.&A-)
,&-&/,B8&,-)/,&)/E%&).#)0".&(,/.&)/"')B#AE0"&)/)D/,0&.C)#1)E0#%#(0B/%)'/./9)/"/%CH&),&%/.0#"-80+-)
E&.$&&")'/./9)+%/")&I+&,0A&".-9)/"')1;,.8&,A#,&)A#'&%)E0#%#(0B/%)+,#B&--&-G)J8&C)/,&)/%-#);-&')0")
A&'0B/%).8&,/+C)+%/""0"()/"')0")&';B/.0#")0")E0#%#(C)/"')A&'0B0"&G)K#,)&I/A+%&).8&,&)/,&)
0"1#,A/.0#")-C-.&A-).8/.)-;++#,.).8&)&I+%#,/.0#")#1)E0#B8&A0B/%)B#""&B.0#"-)/"'),&%/.0#"-)-;B8)/-)
(&"#A&)-&L;&"B&)/--&AE%0&-)?M@)#,)'011&,&"B&-)#1)(&"#A&-)#1)-+&B0&-)?N@G)4"#.8&,).C+&)#1)
0"1#,A/.0#")-C-.&A-),&+,&-&".-)-+/.0/%)/"')-+/.0/%6.&A+#,/%),&%/.0#"-9)1/B0%0./.0"().8&)/"/%C-0-)#1)
A#,+8#%#(C)/"')-#A&.0A&-)/%-#)+8C-0#%#(CG)5&,&)N')/"')O')'0(0./%)/.%/-&-)B#A&)0".#)+%/CG)
J8#-&)/.%/-&-)-&,D&)/-)B#AA#")-+/.0#6.&A+#,/%),&1&,&"B&)-C-.&A-)0")$80B8)'/./)1,#A)'011&,&".)
&I+&,0A&".-9)0A/(0"()A#'/%0.0&-)/"')-B/%&-)/,&)0".&(,/.&'G))K;,.8&,A#,&9)-&A/".0B)0"1#,A/.0#")0-)
%0"P&').#).8&)/.%/-)'/./G)J80-)-&A/".0B)%0"P0"(),&L;0,&-).8&),&(0-.,/.0#")#1).8&)'/./)$0.8)/")#".#%#(C)#,)
B#".,#%%&')D#B/E;%/,C)?O@G)Q".#%#(0&-)0"B,&/-&).8&)+#.&".0/%);-&)/"'),&;-/E0%0.C)#1)'0(0./%)/.%/-&-)EC)
1#,A0"()/)-./"'/,'0H&')/++,#/B8).#)/""#./.0"(9)/"/%CH0"(9)/"')L;&,C0"()'/./)?O@)?R@G)
S#)1/,9)'0(0./%)/.%/-)(&"&,/.0#")B#"B&".,/.&-)#")A#'&%0"()"&;,/%)-.,;B.;,&-9)-;B8)/-)E,/0"-)#,)"&,D&)
B#,'-)#1)A/AA/%-)/"')0"D&,.&E,/.&-)?O@G)>")#,'&,).#)-;++#,.).8&);"'&,-./"'0"()/"')/"/%C-0-)#1)
-.,;B.;,/%)/"')1;"B.0#"/%)B8/,/B.&,0-.0B-)#1)E,/0")-.,;B.;,&-9)#".#%#(0&-)"&&').#)1;%10%)-&D&,/%)
,&L;0,&A&".-G))4")0'&/%)E,/0")#".#%#(C)$#;%')0"B%;'&)/)B#A+%&.&)-&.)#1)-.,;B.;,/%)+/,.-)/"')"&,D&)
.C+&-G)>.)$#;%')1;,.8&,)B#"./0")/I#"/%)+,#:&B.0#"-)E&.$&&"),&(0#"-)/"')"&,D&).C+&-)/"')0.)$#;%')
0"B%;'&)A#,+8#%#(0B/%9)B#""&B.0#"/%9)/"')&%&B.,#+8C-0#%#(0B/%)+,#+&,.0&-)#1)"&;,#"-G)4")0'&/%)
#".#%#(C)/%-#)$#;%')E&)-+&B0&-6-+&B010B)?R@G)Q".#%#(0&-).8/.)'&-B,0E&)E,/0")+/,.-9)"&,D&-)/"')"&;,/%)
B#""&B.0#"-)8/D&)1#,)&I/A+%&)E&&")'&D&%#+&')1#,).8&),/.)E,/0")?R@9).8&)A#;-&)E,/0")?T@9)/"').8&)1%C)
E,/0")?U@G))
V&)-./,.&').#)E;0%')/")#".#%#(C)#1).8&)-.,;B.;,&-)#1).8&)8#"&CE&&)E,/0")0")$80B8)+/,.-)#1).8&)E&&)
E,/0")/,&)A#'&%&')1,#A)$8#%&A#;".-)'#$").#)-C"/+.0B)-$&%%0"(-)?E#;.#"-@)#1)"&,D&-G)V&)%0"P&')#;,)
#".#%#(C).#).8&)-;,1/B&),&B#"-.,;B.0#"-)#1).8&)8#"&CE&&)-./"'/,')/.%/-)?5S3@)?W@)EC)/--0("0"().8&)
,&B#"-.,;B.0#"X-)>2)/"')10%&)"/A&).#)/++,#+,0/.&)+/,.-)#1).8&)#".#%#(CG)J80-)-.&+)/B.;/%%C)&"/E%&-)/")
#".#%#(C6E/-&')E,#$-0"()#1).8&)/.%/-G)>")/)10,-.);-/(&)/++,#/B8)#1).8&)#".#%#(C6%0"P&')5S3)$&)
/'',&--&').8&)/;.#A/.0B)B,&/.0#")#1)A&/"0"(1;%)D0-;/%0H/.0#"-G)Q1.&").8&)+,#B&--)#1)B,&/.0"()
A&/"0"(1;%)/"')&I+,&--0D&)D0-;/%0H/.0#"-)0-).0A&)B#"-;A0"()/"'),&L;0,&-)(##')P"#$%&'(&)#1).8&);-&')
D0-;/%0H/.0#")-#1.$/,&G)>")#;,)/++,#/B8)/%%).8&);-&,)8/-).#)'#)0-).#)-&%&B.)/)-.,;B.;,&).#)E&)D0-;/%0H&')
/"').#)-&%&B.)/)+,&'&10"&')L;&,C9)-;B8)/-)YS8#$)#D&,D0&$YG)4")/%(#,0.8A).8&")/;.#A/.0B/%%C)B,&/.&-)
/)D0-;/%0H/.0#").8/.)B#"./0"-).8&)-&%&B.&')-.,;B.;,&)80(8%0(8.&')/-)/)1#B;-)#E:&B.)/"')1;,.8&,)
-.,;B.;,&-)1#,A0"().8&)B#".&I.)?Z@G))
>").8&)"&I.)-.&+-)$&)$/".).#)'&D&%#+)/")#".#%#(C6E/-&')/.%/-)E,#$-0"().8/.)0"B%;'&-)"#.)#"%C)
,&B#"-.,;B.0#"-)E;.)/%-#).8&)#,0(0"/%)'/./).8&),&B#"-.,;B.0#"-)/,&)E/-&')#"G)K;,.8&,)$&)$#;%')%0P&).#)
+,#D0'&)/)A#,&)D0-;/%)/++,#/B8)1#,)E,#$-0"().8&)#".#%#(CG)V&)$/".).#)0".&(,/.&)/)(,/+8)E/-&')
#".#%#(C),&+,&-&"./.0#").8&);-&,)B/")0".&,/B.)$0.8)/"')$80B8)+,&-&".-)0".&,&-.0"()0"1#,A/.0#")0")/")
0".;0.0D&)$/CG)
!
!"#"$"%&"'(
"#!!"#$%&'#(#")%!*+,-.$%/01*-2(*)%3#",$%4,,'5!$!%&'!$(()*+,-!.*!/&,*0123!415&6!789.&:9!;1*<*38#!
!""#$%&'()*(+&,-&.(",/*01&$"2&3#/$"&.("(4*015&=>>"?!@*<#!=?!((#!ABACAD=#!
=#!6$'7#$%85%9*#12#7)%:.0;7%<5%=0>(+07)%&707>%8*",1)%:-#?#7%=5%@5%=,7#25!$;877CEF(<*)&)6!@19G+<1H123!
I&2*:&!7&JG&2,&!$99&:K<1&9#!6777&89$"1$04*,"1&,"&:*1#$%*;$4*,"&$"2&<,/=#4(9&.9$=>*015&=>>L?!@*<#!
"M?!N?!((#!OO"COOO#!
A#!@*"*0.%@#$#")%!0+0"0%@;7A7#")%4072B#-#"%CD*2-#"5!P1H;&&6!$!PG<.19,+<&!782.&28!;)*'9&)#!6777&
89$"1$04*,"1&,"&:*1#$%*;$4*,"&$"2&<,/=#4(9&.9$=>*015&=>>L?!@*<#!"M?!N?!((#!OLDCL>B#!
B#!=$1%8,1*7#)%E".FG07H%3##)%I"-.;"%!,H05!/131.+<!+.<+9&9!+9!+!5)+:&'*)Q!5*)!0+.+!9-+)123#!?9,"4*(91&*"&
@(#9,10*("0(5&=>>O?!@*<#!=?!"?!((#!">>C">N#!
M#!@*.0*1%8,-0)%30""$%J5%:K072,75!;$P7!2&G)*+2+.*:1,+<!*2.*<*386!0&9132!+20!1:(<&:&2.+.1*2#!
?9,"4*(91&*"&@(#9,*"-,9/$4*015&=>>O?!@*<#!=?!=#!
N#!I1L#"-%8;"H#")%<;7>07%<0?*'2,7)%M*>.0"'%801',>(5!!"$4,/A&B"4,%,C*(1&-,9&D*,*"-,9/$4*015&
E9*"0*=%(&$"2&E9$04*0(?!!7()123&)?!4*20*2?!=>>O#!
D#!R-&!S(&2!;1*:&01,+<!S2.*<*31&9!#!TS2<12&U!=>>L#!-..(6VV'''#*K*5*G20)8#*)3V#!
O#!M,L#"-%8"07'-)%!,"2-#7%M,.1D*7()%=N"H#7%M$L0()%:0L*7#%O",D>A*()%I1#P07'#"%@0$#)%@01-#%
J#2-#".,DD)%4072F6."*2-*07%4#H#)%M07',1D%@#7A#15!R-)&&C/1:&291*2+<!$W&)+3&C7-+(&!$.<+9!*5!.-&!
X*2&8K&&!;)+12!+20!Y.9!$((<1,+.1*29#!8>(&F,#9"$%&,-&<,/=$9$4*)(&@(#9,%,CA5&=>>M?!@*<#!BL=?!"?!((#!"C
"L#!
L#!I7Q0%O;R)%:-#DD#7%C",.02(0)%8QS"7%@#$#")%=N"H#7%M$L0()%4072F6."*2-*07%4#H#5!S2.*<*38CK+9&0!
W19G+<1H+.1*2!*5!-1&)+),-1,+<!2&G)*+2+.*:1,+<!9.)G,.G)&9#!T&0#U!Z-+)<&9![#!;*.-+?!I*)0*2!\120<:+22!
+20![)&1:!;&)-+)0#!E9,0((2*"C1&,-&4>(&7#9,C9$=>*01&G,9H1>,=&,"&:*1#$%&<,/=#4*"C&-,9&D*,/(2*0*"(&
:<DI&JKKLM&==5&NOOPNLQM&=>>O#!
!
Beyond Structure: KiSAO and TEDDY – Two OntologiesAddressing Pragmatical and Dynamical Aspects of
Computational Models in Systems BiologyChristian Knüpfer1, Dagmar Köhn2, and Nicolas Le Novére3
1 Institute of Computer Science, University of Jena, Germany2 Institute for Database and Information Systems, Rostock University,
Germany3 Computational Neurobiology Group, European Bioinformatics Institute
Cambridge, Great Britain
MotivationComputational models are becoming more and more the central scientific par-adigm for understanding the complexity of living systems. With the increasingnumber and size of these models there is a growing need for model reuse andexchange. Furthermore, detailed models are not manageable without computersupport. There are e!orts to formalise the mathematical structure of models(e.g. SBML) and to standardise the kinetic and biological meaning of modelcomponents (e.g. SBO, GO, UniProt). However, formalising only the structureof computational models is not su"cient to easily exchange and reuse modelsand to achieve full computer support for modelling. We also need to formalisethe pragmatical and dynamical aspects of models.
For this purpose we propose two ontologies: The Kinetic Simulation Algo-rithm Ontology (KiSAO) and the TErminology for the Description of DYnamics(TEDDY). KiSAO covers algorithms used for simulation of computational mod-els. The ontology classifies and puts into context existing simulation algorithmsthrough the use of several characteristics, such as deterministic/stochastic orspatial/non-spatial. The aim of TEDDY is to provide terms for describing andcharacterising dynamical behaviours, observable dynamical phenomena, andcontrol elements of biological models and biological systems in Systems Biol-ogy and Synthetic Biology.
KiSAOKiSAO classifies simulation algorithms applicable to biological models usingdi!erent categories and a hierarchy of algorithm versions. Each term containsinformation about synonyms, a definition and a publication reference.
Classification: Simulation algorithms are classified wrt. the following dimen-sions:
• algorithm using deterministic/stochastic rules (e.g. Euler forward vs.Smoluchowski equation based method),
• Spatial/non-spatial approaches, (e.g. Green’s function reaction dynamicsvs. Euler forward),
• discrete/continuous variables (e.g. Cellular automata vs. Livermore sol-ver), and
• fixed/adaptive time-step approaches (e.g. Cellular automata vs. Green’sfunction reaction dynamics).
Algorithm Hierarchy: The algorithms are arranged in a subclass hierarchy,e.g.
algorithm using stochastic rules (KiSAO:0000036)
Gillespie-like stochastic simulation method (KiSAO:0000025)
sub-volume stochastic reaction-di!usion algorithm (KiSAO:0000095)
is-a
is-a
KiSAO is encoded in OBO and developed using OBO-Edit. A transforma-tion of KiSAO-OBO into OWL can be generated using Protégé. For detailson KiSAO see the MIASE project page http://sourceforge.net/projects/miase.
TEDDYIn order to describe the dynamics of a model TEDDY comprises the followingmain categories:Temporal Behaviour: terms for the actual (temporal) dynamical behaviour
of models (e.g. Limit Cycle, Stable Fixed Point),
Behaviour Characteristic: terms for characterising concrete behaviours (e.g.Period) and for discriminating between types of behaviours (e.g. Stablevs. Unstable),
Behaviour Diversification: terms for the ability of systems to exhibit di!er-ent behaviours dependent on parameters (e.g. Supercritical Hopf Bifurca-tion) and with respect to perturbations (e.g. Bi-Stability), and
Functional Motifs: terms for structural features of systems necessary for spe-cific behaviours (e.g. Negative Feedback) and intended for specific func-tions (e.g. Integrator).
There are di!erent types of relations between TEDDY classes:• relations between nearby Temporal Behaviours (e.g. convergeTo),
• relations between Temporal Behaviours and Behaviour Characteristics(e.g. hasStability),
• relations between Functional Motifs and Temporal Behaviours (e.g. de-pendsOn), and
• relations between Behaviour Diversifications and Temporal Behaviours(e.g. hasSuperPart).
TEDDY is encoded in OWL and developed using Protégé 4. The last releaseof the ontology contains 135 terms. For details on TEDDY see the project pagehttp://sourceforge.net/projects/teddyontology.
An Epistemically Adequate Theory of Causality
Hannes Michalek, LeipzigOBML Workshop, November 2009
The Empirical Method understood as performing experiments (studies/trials)is the best approach that we as humans have developed to discover causal rela-tions. Therefore, an “epistemically adequate” theory of the ontology of causalitymust be able to explain why the Empirical Method is successful. And in whatways it is not.
Keeping above aim in mind, we initially follow the ordinary route of ontologicalanalysis:
Conceptual Analysis: What does “causality” mean?
• Limiting the scope: physical causality
• Causality relies on regularity
• Causality relies on counterfactual dependency
Ontological Model: What kinds of entities and relations play a role?
• Presentials as primary causal relata
• Extending the basic causal relation to cover processes as well
• Possible worlds as alternative situations (in this world)
• Regularity and counterfactual dependency
(Formalization is skipped, here)
Then we address the question of epistemic adequacy:
General epistemical considerations:
• In what sense and to what extent are the ontological elements ofour theory epistemically accessible at all: Presentials, coincidencepairs, universals, clusters of coincidence pairs, probabilistics on theseclusters, . . .
Reconstructing experiments/trials:
• How can we understand (or: reconstruct) the basic elements, the suc-cess, and the limitations of performing experiments in terms of ourtheory?
Upshot: Experiments (studies/trials) create tailor-made clusters ofalternative situations with controlled (non-)existence of the allegedcauses and strict detection of the (non-)existence of the expected ef-fects. These alternative situations are the basis for determining therelations of regularity, counterfactual dependency and thus: causality.