principles for building biomedical ontologies: a go perspective
DESCRIPTION
Principles for Building Biomedical Ontologies: A GO Perspective. David Hill Mouse Genome Informatics The Jackson Laoratory. How has GO dealt with some specific aspects of ontology development?. Univocity Positivity Objectivity Single Inheritance Definitions Formal definitions - PowerPoint PPT PresentationTRANSCRIPT
April 19, 2023
Principles for Building Biomedical Ontologies:
A GO Perspective David Hill
Mouse Genome Informatics
The Jackson Laoratory
April 19, 2023
How has GO dealt with some specific aspects of ontology development? Univocity Positivity Objectivity Single Inheritance Definitions
Formal definitions Written definitions
Basis in Reality Universals &
Instances Ontology
Alignment
April 19, 2023
Tactile senseTactionTactition
?
The Challenge of Univocity:People call the same thing by different names
April 19, 2023
Tactile senseTactionTactition
perception of touch ; GO:0050975
Univocity: GO uses 1 term and many characterized synonyms
April 19, 2023
= bud initiation
= bud initiation
= bud initiation
The Challenge of Univocity: People use the same words to describe different things
April 19, 2023
Bud initiation? How is a computer to know?
April 19, 2023
= bud initiation
sensu Metazoa
= bud initiation
sensu Saccharomyces
= bud initiation
sensu Viridiplantae
Univocity: GO adds “sensu” descriptors to discriminate among organisms
April 19, 2023
The Importance of synonyms for utility:How do we represent the function of tRNA?
Biologically, what does the tRNA do?Identifies the codon and inserts the amino acid in the growing polypeptide
Molecular_function
Triplet_codon amino acid adaptor activity
GO Definition: Mediates the insertion of an amino acid at the correct point in the sequence of a nascent polypeptide chain during protein synthesis.
Synonym: tRNA
April 19, 2023
The Challenge of Positivity
Some organelles are membrane-bound.A centrosome is not a membrane bound organelle,but it still may be considered an organelle.
April 19, 2023
The Challenge of Positivity: Sometimes absence is a distinction in a Biologist’s mind
non-membrane-bound organelle
GO:0043228 membrane-bound organelle
GO:0043227
April 19, 2023
Positivity
Note the logical difference between “non-membrane-bound organelle” and “not a membrane-bound organelle”
The latter includes everything that is not a membrane bound organelle!
April 19, 2023
The Challenge of Objectivity: Database users want to know if we don’t know
anything (Exhaustiveness with respect to knowledge)
We don’t know anything about a gene product with
respect to these
We don’t know anything about the ligand that
binds this type of GPCR
April 19, 2023
Objectivity
How can we use GO to annotate gene products when we know that we don’t have any information about them? Currently GO has terms in each ontology to
describe unknown An alternative might be to annotate genes to root
nodes and use an evidence code to describe that we have no data.
Similar strategies could be used for things like receptors where the ligand is unknown.
April 19, 2023
GPCRs with unknown ligands
We could annotate to
this
April 19, 2023
Single Inheritance
GO has a lot of is_a diamonds Some are due to incompleteness of the
graph Some are due to a mixture of dissimilar
classes within the graph at the same level
April 19, 2023
Is_a diamond in GO Process
behavior
locomotory behavior larval behavior
larval locomotory behavior
April 19, 2023
Is_a diamond in GO Functionenzyme regulator
activity
GTPase regulatoractivity
enzyme activatoractivity
GTPase activatoracivity
April 19, 2023
Is_a diamond in GO Cellular Component
organelle
non-membrane boundorganelle
intracellularorganelle
non-membrane boundintracellular organelle
April 19, 2023
Technically the diamonds are correct, but could be eliminated
locomotory behavior larval behavior
GTPase regulatoractivity
enzyme activatoractivity
non-membrane boundorganelle
intracellularorganelle
What do these pairs have in common?
April 19, 2023
What do the middle pair of terms all have in common?
locomotory behavior larval behavior
GTPase regulatoractivity
enzyme activatoractivity
non-membrane boundorganelle
intracellularorganelle
April 19, 2023
They are all differentiated from the parent term by a different factor
locomotory behavior larval behavior
GTPase regulatoractivity
enzyme activatoractivity
non-membrane boundorganelle
intracellularorganelle
Type of behavior vs. what is behaving
What is regulated vs. type of regulator
Type of organelle vs. location of organelle
April 19, 2023
Insert an intermediate grouping term
behavior
locomotory behavior larval behavior
larval locomotory behavior
behavior of a thing
descriptive behavior
April 19, 2023
Why insert terms that no one would use?
behavior
locomotory behavior larval behavior rhythmic behavior adult behavior
By the structure of this graph, locomotory behaviorhas the same relationship to larval behavior
as to rhythmic behavior
April 19, 2023
Why insert terms that no one would use?
behavior
locomotory behavior larval behaviorrhythmic behavior adult behavior
But actually, locomotory behavior/rhythmic behaviorand larval behavior/adult behavior group naturally
Descriptivebehavior
Behaviorof athing
April 19, 2023
Is_a diamond in GO Process
behavior
locomotory behavior larval behavior
larval locomotory behavior
The realtionships differentiate behavior in different ways
April 19, 2023
GO DefinitionsA definition written by
a biologist:necessary & sufficient
conditions written definition(not computable)
Graph structure: necessary conditions
formal(computable)
April 19, 2023
Relationships and definitions
The set of necessary conditions is determined by the graph This can be considered a partial definition
Important considerations: Placement in the graph- selecting parents Appropriate relationships to different
parents True path violation
April 19, 2023
Placement in the graph
Example- Proteasome complex
April 19, 2023
The importance of relationships Cyclin dependent protein kinase
Complex has a catalytic and a regulatory subunit How do we represent these activities (function) in
the ontology? Do we need a new relationship type (regulates)?
Catalytic activity
protein kinase activity
protein Ser/Thr kinase activity
Cyclin dependent protein kinase activity
Cyclin dependent protein kinase regulator activity
Molecular_function
Enzyme regulator activity
Protein kinase regulator activity
April 19, 2023
True path violationWhat is it?
..”the pathway from a child term all the way up to its top-level parent(s) must always be true".
chromosome
Mitochondrial chromosome
Is_a relationship
Part_of relationship
nucleus
April 19, 2023
True path violationWhat is it?
..”the pathway from a child term all the way up to its top-level parent(s) must always be true".
nucleus chromosome
Nuclear chromosome
Mitochondrial chromosome
Is_a relationshipsPart_of relationship
April 19, 2023
GO textual definitions: Related GO terms have similarly structured (normalized) definitions
April 19, 2023
Structured definitions contain both genus and differentiae
Essence = Genus + Differentiae
neuron cell differentiation =Genus: differentiation (processes whereby a relativelyunspecialized cell acquires the specialized features of..)Differentiae: acquires features of a neuron
April 19, 2023
Basis in Reality
GO is designed by a consortium As long as egos don’t get in the way, GO
represents universals rather than concepts Large-scale developments of the GO are a result
of compromise
Gene Annotators have a large say in GO content Annotators are experts in their fields Annotators constantly read the scientific literature
April 19, 2023
Ontology alignmentOne of the current goals of GO is to align:
cone cell fate commitment retinal_cone_cell
keratinocyte differentiation keratinocyte
adipocyte differentiation fat_cell
dendritic cell activation dendritic_cell
lymphocyte proliferation lymphocyte
T-cell homeostasis T_lymphocyte
garland cell differentiation garland_cell
heterocyst cell differentiation heterocyst
Cell Types in GO Cell Types in the Cell Ontologywith
April 19, 2023
Alignment of the Two Ontologies will permit the generation of consistent and complete definitions
id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629]is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375
GO
Cell type
New Definition
+
=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.
April 19, 2023
Alignment of the Two Ontologies will permit the generation of consistent
and complete definitions
id: GO:0001649
name: osteoblast differentiationsynonym: osteoblast cell differentiationgenus: differentiation GO:0030154 (differentiation)differentium: acquires_features_of CL:0000062 (osteoblast)definition (text): Processes whereby a relatively unspecialized cell acquires the specialized features of an osteoblast, the mesodermal cell that gives rise to bone
Formal definitions with necessary and sufficient conditions, in both human readable and computer readable forms
April 19, 2023
Other Ontologies that can be aligned with GO
Chemical ontologies 3,4-dihydroxy-2-butanone-4-phosphate synthase activity
Anatomy ontologies metanephros development
GO itself mitochondrial inner membrane peptidase activity
April 19, 2023
But Eventually…
April 19, 2023
But, what about instances?
What are the instances we are dealing with in our work as ontology builders
and scientific curators?
April 19, 2023
What knowledge are we trying to capture?
We are interested in understanding how genes contribute to the biology of
an organism.
April 19, 2023
What do we mean by gene product?
Gene Product Type An abstract representation of a gene
These are the representations we have in MODs
Gene Product Instance A molecule of a gene product
It can be physically isolated It takes up space
April 19, 2023
How do wet-bench biologists learn about gene products?
They do experiments!Experiments are designed to study the properties of
gene product instances.
Experimental biologists take on “The Burden of Proof”.
April 19, 2023
How do we represent the accumulated knowledge
We make annotations!Annotations connect what wet-bench biologists see in the lab with how we
represent our understanding of biology
April 19, 2023
So, where are the instances?
The instances are in the lab. We use what people report about instances, but we never actually deal with them
directly
April 19, 2023
Examples of how we connect instances with knowledge representation in the GO
What follows are examples of annotation of the biomedical literature using GO types, gene product types
and evidence codes
April 19, 2023
Example #1:Molecular Function using IDA
Figure from
Zhang M, Chen W, Smith SM, Napoli JL.Molecular characterization of a mouse short chain dehydrogenase/reductase active with all-trans-retinol in intact cells, mRDH1.J Biol Chem. 2001 Nov 23;276(47):44083-90.
April 19, 2023
The Annotation:
The Observation
NAD+
NADHH+
April 19, 2023
What are the instances in this experiment?
Gene product instances Molecules of retinol dehydrogenase
Molecular function instances Instances of execution of the molecular function
revealed by the assay Instances of molecular function associated with
instances of retinol dehydrogenase. These instances are the potential of a molecule of retinol dehydrogenase to execute the function retinol dehydrogenase activity.
April 19, 2023
Example #2:Molecular Function using IMP
Figure from
Schulz S, Lopez MJ, Kuhn M, Garbers DL.Disruption of the guanylyl cyclase-C gene leads to a paradoxical phenotype of viable but heat-stable enterotoxin-resistant mice.J Clin Invest. 1997 Sep 15;100(6):1590-5.
April 19, 2023
The Annotation:
The Observation
X X
IMP
April 19, 2023
What are the instances in this experiment?
Gene product instances Molecules of GUCY2C protein The lack of functional molecules of GUCY2C in
mutants Molecular function instances
The execution of the molecular function, measured by the accumulation of cGMP
The potential of a molecule of GUCY2C to execute the molecular function Revealed by the correlation between a lack of molecules
and a lack of executions of molecular function
April 19, 2023
Example #3:Molecular Function using IGI
Figure from
Sango K; McDonald MP; Crawley JN; Mack ML; Tifft CJ; Skop E; Starr CM; Hoffmann A; Sandhoff K; Suzuki K; Proia RLMice lacking both subunits of lysosomal beta-hexosaminidase display gangliosidosis and mucopolysaccharidosis.Nat Genet 1996 Nov;14(3):348
April 19, 2023
The Annotation:
The Observation
X
April 19, 2023
The Annotation:
The Observation
X
April 19, 2023
The Annotation:
The Observation
X
April 19, 2023
The Annotation:
The Observation
X
April 19, 2023
The Annotation:
The Observation
IGI
X
April 19, 2023
What are the instances in this experiment?
Gene product instances Molecules of HEXA protein Molecules of HEXB protein The lack of functional HEXA/HEXB protein in mutant cells
Molecular function instances The execution of the molecular function beta-N-
acetylhexosaminidase as measured by glycosaminoglycan accumulation
The potential of a molecule of HEXA/HEXB to execute the molecular function beta-N-acetylhexosaminidase
April 19, 2023
Example #4:Molecular Function using IPI
Figure from
Kuwako K; Hosokawa A; Nishimura I; Uetsuki T; Yamada M; Nada S; Okada M; Yoshikawa K
Disruption of the paternal necdin gene diminishes TrkA signaling for sensory neuron survival.
J Neurosci 2005 Jul 27;25(30):7090-9.
April 19, 2023
The Annotation:
The Observation
IPI
NDN
NDN
April 19, 2023
The Annotation:
The Observation
IPI
NDN
NDN
April 19, 2023
What are the instances in this experiment?
Gene product instances FLAG-tagged molecules of NTRKA FLAG-tagged molecules of NGFR Molecules of NDN
Molecular function instances The execution of the molecular function protein binding between
instances of NDN and NTRKA-FLAG The execution of the molecular function protein binding between
instances of NDN and NGFR-FLAG The potential of a molecule of NDN to execute protein binding to a
molecule of NTRKA-FLAG or NGFR-FLAG The potential of a molecule of NTRKA-FLAG to execute protein
binding to a molecule of NDN The potential of a molecule of NGFR-FLAG to execute protein binding
to a molecule of NDN
April 19, 2023
What About Biological Process?
It is very similar to function with a few exceptions
April 19, 2023
Biological Process Using IMP
Washington Smoak I; Byrd NA; Abu-Issa R; Goddeeris MM; Anderson R; Morris J; Yamamura K; Klingensmith J; Meyers EN, Sonic hedgehog is required for cardiac outflow tract and neural crest cell development.,
Dev Biol 2005 Jul 15;283(2):357-72.
April 19, 2023
The Annotation:
The Observation
IMP
April 19, 2023
The Annotation:
The Observation
IMP
X
April 19, 2023
What are the instances in this Experiment?
Gene product instances Molecules of the Shh gene Non-functional molecules of the Shh gene
Biological Process instances The development of a mouse heart
Molecular Function Instances The excecution of a molecular function by a
molecule of the Shh gene
So, when a process occursit is the result of moleculesof a gene product executing
their molecular function
April 19, 2023
How do wet-bench biologists learn about gene products?
They do experiments!Experiments are designed to study the properties of
gene product instances.
Experimental biologists take on “The Burden of Proof”.
They make conclusionsabout gene product typesbased on the accumulated
experimental data!
April 19, 2023
If experiments show:
All instances of a gene product studied have the potential to execute the function tyrosine kinase
Instances of the same gene product are involved in the biological process limb development
All instances of the same gene product are found in instances of the cytoplasm
April 19, 2023
A wet-bench biologist would conclude
The gene product of this gene is a tyrosine kinase that functions in the cytoplasm and the tyrosine kinase function is used in limb
development
April 19, 2023
If we comprehensively annotate genes, can we make
the same conclusions?
This is the basis of biological discovery!
April 19, 2023
A tribute to Lewis CarrollOnce master the machinery of Symbolic Logic, and you have a mental occupation always at hand, of absorbing interest, and one that will be of real use to you in any subject you may take up. It will give you clearness of thought - the ability to see your way through a puzzle - the habit of arranging your ideas in an orderly and get-at-able form - and, more valuable than all, the power to detect fallacies, and to tear to pieces the flimsy illogical arguments, which you will so continually encounter in books, in newspapers, in speeches, and even in sermons, and which so easily delude those who have never taken the trouble to master this fascinating Art. Lewis Carroll
(a) All babies are illogical.(b) Nobody is despised who can manage a crocodile.(c) Illogical persons are despisedCan a baby can manage a crocodileNo!