towards a top-level ontology for molecular biology stefan schulz 1, elena beisswanger 2, udo hahn 2,...

85
Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1 , Elena Beisswanger 2 , Udo Hahn 2 , Joachim Wermter 2 , 1 Freiburg University Hospital, Department of Medical Informatics, Germany 2 Jena University Language and Information Engineering (JULIE) Lab

Upload: estella-boone

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Towards a Top-Level Ontology for Molecular Biology

Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2,

1 Freiburg University Hospital, Department of Medical Informatics, Germany 2 Jena University Language and Information Engineering (JULIE) Lab

Page 2: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

What’s an ontology…?

Page 3: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Our notion of ontology

Universally accepted truths

Consolidated but context-dependent facts

Hypotheses, beliefs, statistical associations

Domain Knowledge

Page 4: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Consolidated but context-dependent facts

Hypotheses, beliefs, statistical associations

Ontology !

Universally accepted truths

Domain Knowledge

Page 5: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Data Sources in Biomedicine

ONTOLOGIES

PubMed

Literature Collection

Raw Data

Experimental DataYeast

UniProt

FlyBase

DatabasesMouse

Clinical

Data

Page 6: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

ONTOLOGIES

PubMed

Literature Collection

Raw Data

Experimental DataYeast

UniProt

FlyBase

DatabasesMouse

Clinical

Data

Bioontologies are devised to support

Data Annotation

Data Integration

Page 7: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

How can ontologies be structured?

Page 8: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

Ontological Layers

Page 9: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

OBO (Gene Ontology, Sequence Ontology, Cell Ontology, Mouse Anatomy, ChEBI, …)

Ontological Layers

Page 10: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

• Transcription • DNA-dependent transcription

• antisense RNA transcription• mRNA transcription• rRNA transcription• tRNA transcription • …(from Gene Ontology)

OBO (Gene Ontology, Sequence Ontology, Cell Ontology, Mouse Anatomy, ChEBI, …)

Ontological Layers

Page 11: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

DOLCE

BFO

OBO (Gene Ontology, Sequence Ontology, Cell Ontology, Mouse Anatomy, ChEBI, …)

Ontological Layers

Page 12: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

DOLCE

BFO

•Entity• Continuant• Dependent Continuant• Realizable Entity• Function• Role• Independent Continuant • Object• Object Aggregate• Occurrent (from BFO)

OBO (Gene Ontology, Sequence Ontology, Cell Ontology, Mouse Anatomy, ChEBI, …)

Ontological Layers

Page 13: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Domain Upper

Ontology /

Top Domain Ontology

Upper

Ontology /

Top Ontology

Domain Ontology

DOLCE

BFO

OBO (Gene Ontology, Sequence Ontology, Cell Ontology, Mouse Anatomy, ChEBI, …)

Ontological Layers

Page 14: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

DOLCE

BFO

OBO (Gene Ontology, Sequence Ontology, Cell Ontology, Mouse Anatomy, ChEBI, …)

Ontological Layers

Page 15: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

DOLCE

BFO Biology Domain: • Organism• Body Part• Cell• Cell Component• Tissue• Protein• Nucleic Acid• DNA• RNA• Biological Function• Biological Process

OBO (Gene Ontology, Sequence Ontology, Cell Ontology, Mouse Anatomy, ChEBI, …)

Ontological Layers

Page 16: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

Simple Bio Upper Ontology

GFO-Bio

DOLCE

BFO

GENIA

OBO (Gene Ontology, Sequence Ontology, Cell Ontology, Mouse Anatomy, ChEBI, …)

Ontological Layers

Page 17: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

Simple Bio Upper Ontology

GFO-Bio

GENIA

DOLCE

BFO

Ontological Layers

BioTop

OBO (Gene Ontology, Sequence Ontology, Cell Ontology, Mouse Anatomy, ChEBI, …)

Page 18: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

GENIA as example for top level domain ontology.

Page 19: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• Developed at Tsujii Laboratory, University Tokyo

• Taxonomy, 48 classes

• Verbal definitions (“scope notes”)

• No relations other than subclass relations

• Purpose: Semantic annotation of biological papers

“The GENIA ontology is intended to be a formal model of cell signaling reactions in human. It is to be used as a basis of thesauri and semantic dictionaries for natural language processing applications […] Another use of the GENIA ontology is to provide a basis for integrated view of multiple databases”

The GENIA Ontology

http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/topics/Corpus/genia-ontology.html

Page 20: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/corpus/GENIAontology.owl

GENIA Ontology

Page 21: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

GENIA Critique

Page 22: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• Amino Acid Monomer:

“An amino acid monomer, e.g. tyrosine, serine, tyr, ser”

• DNA:

“DNAs include DNA groups, families, molecules, domains, and regions”

Insufficient Definitions

Page 23: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

False Taxonomic Parent GENIA: Source

• ‘Organism’, ‘Tissue’, … are objects and not primarily sources!

• ‘Source’ is a role …

“Sources are biological locations where substances are found …”

Page 24: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• Are the instances of

‘Cell Type’ classes ?

• Otherwise rename the

class as ‘Cell’!

Misconception of “Type” GENIA: Cell Type

“A cell type, e.g. T-Lymphocyte, T-cell, astrocyte, fibroblast”

Page 25: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Non-Conformant Naming PolicyGENIA: Amino Acid, GENIA: Protein

“An amino acid molecule or the compounds that consist of amino acids.”

“Proteins include protein groups, families, molecules, complexes, and substructures.”

• Names suggest single molecules

• Don’t work against biologists’ intuition!

Page 26: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,
Page 27: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

GENIA

Redesign:

BioTop

Page 28: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• Ontologically founded Top level ontology for biology

• Extensive use of formal relations (OBO)• Formal semantics (OWL-DL)• Use as a semantic glue between existing

ontologies• Scope: as GENIA (in a 1st phase)

BioTop

Page 29: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

BioTop Relations

• partOf

• properPartOf

• locatedIn

• derivesFrom

• hasParticipant

Page 30: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

BioTop Relations

• partOf

• properPartOf

• locatedIn

• derivesFrom

• hasParticipant

• hasFunction (functionOf)

Page 31: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

BioTop Relations

grainOf (hasGrain)

componentOf (hasComponent)• partOf

• properPartOf

• locatedIn

• derivesFrom

• hasParticipant

• hasFunction (functionOf)

Page 32: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• hasGrain non-transitive subrelation of hasPart• Example: “Collective of Cell hasGrain only Cell”

• Collectives can gain or lose grains without changing their identity

Collectives and hasGrain

Page 33: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• hasComponent non-transitive subrelation of hasPart • Example: Definition of Protein

• Compounds are determined by their components

Compounds and hasComponent

ProteinArginine

Carbon

Oxygen

NH2

Page 34: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• Classes defined by necessary and sufficient conditions whenever possible

• Rationales- Precise understanding of meaning- Empowering the classifier for automated validation

processes• Required introduction of new

classes, e.g. Phosphate, Ribose

Full Definitions

Sufficient conditions for class ‘Nucleotide’

Nucleotide

Ribose

BasePhosphate

Page 35: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Rearranged Classes

GENIA BioTop

hasComponentcomponentOfproperPartOf

Page 36: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,
Page 37: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• BioTop.owl:- 143 non-taxonomic relation instances

(seven relation types) - 128 classes

• BioTopGenia.owl:- Imports GENIA- Maps BioTop classes to GENIA classes

BioTop Figures

Page 38: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

BioTop: Interfacing with OBO Ontologies

Biological Process ↔ Biological ProcessGene Ontology

Protein Function ↔ Molecular FunctionGene Ontology

Cell Component ↔ Cellular ComponentGene Ontology

Cell ↔ CellCell Ontology and CellFMA

Atom ↔ AtomsChEBI

Organic Compound ↔ Organic Molecular EntitiesChEBI

Tissue ↔ TissueFMA

DNA, RNA ↔ DNASequence Ontology, RNASequence Ontology

Protein ↔ ProteinSequence Ontology

BioTop OBO Ontologies

Page 39: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Proposal: BioTop to enrich OBO Foundry

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN AND ORGANISM

Organism(NCBI

Taxonomy?)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULE Molecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Page 40: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

OBO Foundry and BioTop

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN AND ORGANISM

Organism(NCBI

Taxonomy?)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULE Molecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

BioTop BioTop BioTop BioTop

Bio

To

pB

ioT

op

Page 41: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

BFO

BioTop

The OBO FoundryThe OBO Foundryhttp://obofoundry.org/http://obofoundry.org/

Page 42: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Outlook

• Integration with BFO (work in process)• Completion of textual and formal definitions• Extension of process and function branch• Integration of BioTop in OBO foundry• Mapping BioTop to the UMLS Semantic Network• Use of BioTop in text mining: Experimental validation of

added value compared to GENIA / thesaurus approach• Empirical Studies to create evidence of whether

• Formal Ontologies better serve the needs of knowledge annotation / processing in biomedicine

• Informal Thesauri are sufficient

Page 43: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

if inadequate

if inconsistent

Analyze/ complete textual class definitions

Analyze/ clean taxonomic relations

Check ontological nature of classes

Identify interfaces to existing biomedical ontologies

Make classes disjoint (wherever possible)

Check consistency

Check inferred hierarchy for adequacy

Add /change necessary and (wherever possible) sufficient conditions

Select formal relations

Connect classes to upper ontology

Analyze/correct non-taxonomic relations, add descriptions

BioTop Redesign is going on

Page 44: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

if inadequate

if inconsistent

BioTop Redesign is going on

Analyze/ complete textual class definitions

Analyze/ clean taxonomic relations

Check ontological nature of classes

Identify interfaces to existing biomedical ontologies

Make classes disjoint (wherever possible)

Check consistency

Check inferred hierarchy for adequacy

Add /change necessary and (wherever possible) sufficient conditions

Select formal relations

Connect classes to upper ontology

Analyze/correct non-taxonomic relations, add descriptions

Contributions / Critiquewelcome:

[email protected]

[email protected]

Page 45: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,
Page 46: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Towards a Top-Level Ontology for Molecular Biology

Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2,

1 Freiburg University Hospital, Department of Medical Informatics, Germany 2 Jena University Language and Information Engineering (JULIE) Lab

Page 47: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

Ontology Layers

Page 48: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

OBO (GO, SO, ChEBI, …)

Ontology Layers

Page 49: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

OBO (GO, SO, ChEBI, …)

DOLCE

BFO

Ontology Layers

Page 50: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

OBO (GO, SO, ChEBI, …)

DOLCE

BFO

Ontology Layers

Page 51: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Upper

Ontology /

Top Ontology

Domain Upper

Ontology /

Top Domain Ontology

Domain Ontology

OBO (GO, SO, ChEBI, …)

Simple Bio Upper Ontology

GFO-Bio

GENIA

DOLCE

BFO

BioTop

Ontology Layers

Page 52: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

GENIA Ontology

Page 53: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/topics/Corpus/genia-ontology.html

GENIA Ontology

Page 54: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• 45 classes• Taxonomy (taxonomic relations not clearly defined)• Text definitions (“scope notes”)• Developed for semantic annotation of biological papers

“The GENIA ontology is intended to be a formal model of cell signaling reactions in human.

... use of the GENIA ontology is to provide a basis for integrated view of multiple databases“

GENIA Ontology

Page 55: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

GENIA Critique 1. Incomplete Textual Definitions

2. Modeling Errors

Page 56: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

1. Amino Acid Monomer:

“An amino acid monomer, e.g. tyrosine, serine, tyr, ser”

2. DNA:

“DNAs include DNA groups, families, molecules, domains, and regions”

GENIA Scope Notes

Page 57: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

“… substances involved in biochemical reactions.”

“Sources are biological locations where substances are found and their reactions take place, …”

Critique: Organism, body part, cell type, are not primarily ‘sources’…

Suggestion: ‘Source’ as role, ‘object’ as superclass instead

False Taxonomic Parent GENIA: Source

Page 58: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

“… substances involved in biochemical reactions.”

“Sources are biological locations where substances are found and their reactions take place, …”

Critique: Organism, body part, cell type, are not primarily ‘sources’…

Suggestion: ‘Source’ as role, ‘object’ as superclass instead

False Ontological Parent GENIA: Source

Page 59: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Critique: What is meant by ‘Cell Type’ ?

1. The universal cell (instantiated by

e.g. a particular T cell)

2. A meta level category (instantiated

by e.g. the universal T cell)

“A cell type, e.g. T-Lymphocyte, T-cell, astrocyte, fibroblast [sic]”

Misconception of Type GENIA: Cell Type

Page 60: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Critique: What is meant by ‘Cell Type’ ?

1. The universal cell (instantiated by

e.g. a particular T cell)

2. A meta level category (instantiated

by e.g. the universal T cell)

“A cell type, e.g. T-Lymphocyte, T-cell, astrocyte, fibroblast [sic]”

Misconception of Type GENIA: Cell Type

Page 61: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Critique:

1. A protein family is not a ‘protein’

2. Refers to a human-made

classification grouping proteins of

the same function / location /

structure.

Suggestion: Introduce classes

‘Protein Function’, ‘Protein Role’ and

subclasses to classify proteins“A protein family or group, e.g. STATs”

MisclassificationGENIA: Protein Family or Group

Page 62: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Critique:

1. A protein family is not a ‘protein’

2. Refers to a human-made

classification grouping proteins of

the same function / location /

structure.

Suggestion: Introduce classes

‘Protein Function’, ‘Protein Role’ and

subclasses to classify proteins“A protein family or group, e.g. STATs”

MisclassificationGENIA: Protein Family or Group

Page 63: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Ontologically irrelevant, but necessary for exhaustively partitioning a domain

Critique:

• Inconsequent use

• Missing definitions

Suggestion: Consequent use of defined residual classes

Inconsistent Use of Residual Classes

Page 64: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Ontologically irrelevant, but necessary for exhaustively partitioning a domain

Critique:

• Inconsequent use

• Missing definitions

Suggestion: Consequent use of defined residual classes

Inconsistent Use of Residual Classes

Page 65: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

“Tissue, e.g. peripheral blood, lymphoid tissue, vascular endothelium [sic]”

Critique: Not clear what is an instance of tissue.

1. Totality of the mass/collective, e.g. all red blood cells (RBCs)

2. Any portion of it, e.g. RBC in a lab sample

3. The minimal constituent, e.g. a single RBC

Suggestion: Strict distinction between singletons and collectives. New class ‘collective’ and respective sublcasses

Collectives vs. their ElementsGENIA: Tissue

Page 66: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

“Tissue, e.g. peripheral blood, lymphoid tissue, vascular endothelium [sic]”

Critique: Not clear what is an instance of tissue.

1. Totality of the mass/collective, e.g. all red blood cells (RBCs)

2. Any portion of it, e.g. RBC in a lab sample

3. The minimal constituent, e.g. a single RBC

Suggestion: Strict distinction between singletons and collectives. New class ‘collective’ and respective sublcasses

Collectives vs. their ElementsGENIA: Tissue

Page 67: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Critique: ‘Amino Acid ‘ counterintuitive name (suggests single molecule, not chain)

Suggestion: Remove class ‘Amino Acid’, use classes ‘Amino Acid Monomer’ and ‘Amino Acid Polymer’ with proper definitions

“An amino acid molecule or the compounds that consist of amino acids.”

“An amino acid monomer e.g., tyrosine, serin, tyr, ser ”

Non-Conformable Naming Policy

Page 68: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

“An amino acid molecule or the compounds that consist of amino acids.”

“An amino acid monomer e.g., tyrosine, serin, tyr, ser ”

Non-Conformable Naming Policy

Critique: ‘Amino Acid ‘ counterintuitive name (suggests single molecule, not chain)

Suggestion: Remove class ‘Amino Acid’, use classes ‘Amino Acid Monomer’ and ‘Amino Acid Polymer’ with proper definitions

Page 69: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Critique: Many taxonomic links ontologically incorrect.

Example: part-of mistaken for is-a

Suggestion: Reduce is-a to its proper meaning (subclass-of) introduce part-of and other relations

Lack of Non-Taxonomic Relations

Page 70: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Critique: Many taxonomic links ontologically incorrect.

Example: part-of mistaken for is-a

Suggestion: Reduce is-a to its proper meaning (subclass-of) introduce part-of and other relations

Lack of Non-Taxonomic Relations

Page 71: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

From GENIA to BioTop

Page 72: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

BioTop

• Upper level ontology for biology

(focusing on biomedicine and molecular biology)• Ontologically founded classes and relations• Adding formal semantics to only verbally defined classes

• Language: OWL-DL• Editing environment: Protégé

http://morphine.coling.uni-freiburg.de/~schulz/BioTop/BioTop.html

Page 73: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• Basis: OBO Relation Ontology - properPartOf and hasProperPart

- locatedIn and locationOf

- derivesFrom

- hasParticipant and participatesIn

• Refined partOf:- properPartOf and hasProperPart: irreflexive

- componentOf and hasComponent: nontransitive

- grainOf and hasGrain: nontransitive

• Additional:- hasFunction and functionOf

BioTop Relations

Page 74: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• Controversial: Referents in biological texts: Collectives (pluralities) or count objects?

“Emerin is phosphorylated on serine 49 by protein kinase A”

• BioTop: “Collective of x hasGrain x”• hasGrain non-transitive subrelation of hasPart

cf. Schulz & Jansen, KR-MED 2006

Collectives and Relation hasGrain

PKA

or

Emerin

Page 75: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Arginine

Carbon

Oxygen

NH2

Relation hasComponent

Protein

• Non-transitive subrelation of hasPart • Example: Definition of Protein

- Protein hasPart only amino acid is not true (hasPart transitive)

- Protein hasComponent only amino acid is true

(hasComponent not transitive)

Page 76: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• Classes defined by necessary and sufficient criteria• Rationales

- Precise understanding of meaning- More rigor in automated validation process

(terminological classifier)

• Introduction of new classes

required• Example Nucleotide

Full Definitions

Page 77: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• Snapshot

Rearranged Classes

Page 78: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• Non-Physical Continuant- Biological Function- Biological Location

• Process- Biological Process

New Branches in BioTop

Page 79: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• ??# Classes, ??# fully defined • ??# Relations (evtl. Noch aufgeschluesselt)

BioTop Results

Page 80: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Protein FunctionBioTop ↔ Molecular FunctionGO

Cell ComponentBioTop ↔ Cellular ComponentGO

Biological ProcessBioTop ↔ Biological ProcessGO

CellBioTop ↔ CellCell Ontology and CellFMA

AtomBioTop ↔ AtomsChEBI

Organic CompoundBioTo ↔ Organic Molecular EntitiesChEBI

TissueBioTop ↔ TissueFMA

DNABioTop, RNABioTop ↔ DNASO, RNASO

ProteinBioTop ↔ ProteinSO

OrganismBioTop ↔ ??NCBI Taxonomy

Ontology Integration

Page 81: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

NCBI TaxonomyBioTop

ChEBI

Cell Ontology

MolecularFunction (GO) Biological

Process(GO)

FMA

Sequence Ontology

CellularComponent

(GO)

Ontology Integration

Page 82: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

• BioTopGenia.owl:- Imports GENIA- Provides links from BioTop to GENIA classes

Mapping to GENIA

Page 83: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

Outlook

• Integration with BFO (work in process)• Completion of textual and formal definitions• Extension of process and function branch• Integration of BioTop in OBO foundry• Use of BioTop in text mining: Experimental validation of

added value compared to GENIA / thesaurus approach• Create evidence of whether

• Formal Ontologies better serve the needs of knowledge annotation / processing in biomedicine

• Informal Thesauri are sufficient

Page 84: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

if inadequate

if inconsistent

Ontology Redesign is going on

Analyze/ complete textual class definitions

Analyze/ clean taxonomic relations

Check ontological nature of classes

Identify interfaces to existing biomedical ontologies

Make classes disjoint (wherever possible)

Check consistency

Check inferred hierarchy for adequacy

Add /change necessary and (wherever possible) sufficient conditions

Select formal relations

Connect classes to upper ontology

Analyze/correct non-taxonomic relations, add descriptions

Page 85: Towards a Top-Level Ontology for Molecular Biology Stefan Schulz 1, Elena Beisswanger 2, Udo Hahn 2, Joachim Wermter 2, 1 Freiburg University Hospital,

if inadequate

if inconsistent

Ontology Redesign is going on

Analyze/ complete textual class definitions

Analyze/ clean taxonomic relations

Check ontological nature of classes

Identify interfaces to existing biomedical ontologies

Make classes disjoint (wherever possible)

Check consistency

Check inferred hierarchy for adequacy

Add /change necessary and (wherever possible) sufficient conditions

Select formal relations

Connect classes to upper ontology

Analyze/correct non-taxonomic relations, add descriptions