principles for building biomedical ontologies: a go perspective

73
June 15, 2022 Principles for Building Biomedical Ontologies: A GO Perspective David Hill Mouse Genome Informatics The Jackson Laoratory

Upload: brynn-haney

Post on 31-Dec-2015

26 views

Category:

Documents


1 download

DESCRIPTION

Principles for Building Biomedical Ontologies: A GO Perspective. David Hill Mouse Genome Informatics The Jackson Laoratory. How has GO dealt with some specific aspects of ontology development?. Univocity Positivity Objectivity Single Inheritance Definitions Formal definitions - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Principles for Building Biomedical Ontologies:

A GO Perspective David Hill

Mouse Genome Informatics

The Jackson Laoratory

Page 2: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

How has GO dealt with some specific aspects of ontology development? Univocity Positivity Objectivity Single Inheritance Definitions

Formal definitions Written definitions

Basis in Reality Universals &

Instances Ontology

Alignment

Page 3: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Tactile senseTactionTactition

?

The Challenge of Univocity:People call the same thing by different names

Page 4: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Tactile senseTactionTactition

perception of touch ; GO:0050975

Univocity: GO uses 1 term and many characterized synonyms

Page 5: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

= bud initiation

= bud initiation

= bud initiation

The Challenge of Univocity: People use the same words to describe different things

Page 6: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Bud initiation? How is a computer to know?

Page 7: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

= bud initiation

sensu Metazoa

= bud initiation

sensu Saccharomyces

= bud initiation

sensu Viridiplantae

Univocity: GO adds “sensu” descriptors to discriminate among organisms

Page 8: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Importance of synonyms for utility:How do we represent the function of tRNA?

Biologically, what does the tRNA do?Identifies the codon and inserts the amino acid in the growing polypeptide

Molecular_function

Triplet_codon amino acid adaptor activity

GO Definition: Mediates the insertion of an amino acid at the correct point in the sequence of a nascent polypeptide chain during protein synthesis.

Synonym: tRNA

Page 9: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Challenge of Positivity

Some organelles are membrane-bound.A centrosome is not a membrane bound organelle,but it still may be considered an organelle.

Page 10: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Challenge of Positivity: Sometimes absence is a distinction in a Biologist’s mind

non-membrane-bound organelle

GO:0043228 membrane-bound organelle

GO:0043227

Page 11: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Positivity

Note the logical difference between “non-membrane-bound organelle” and “not a membrane-bound organelle”

The latter includes everything that is not a membrane bound organelle!

Page 12: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Challenge of Objectivity: Database users want to know if we don’t know

anything (Exhaustiveness with respect to knowledge)

We don’t know anything about a gene product with

respect to these

We don’t know anything about the ligand that

binds this type of GPCR

Page 13: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Objectivity

How can we use GO to annotate gene products when we know that we don’t have any information about them? Currently GO has terms in each ontology to

describe unknown An alternative might be to annotate genes to root

nodes and use an evidence code to describe that we have no data.

Similar strategies could be used for things like receptors where the ligand is unknown.

Page 14: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

GPCRs with unknown ligands

We could annotate to

this

Page 15: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Single Inheritance

GO has a lot of is_a diamonds Some are due to incompleteness of the

graph Some are due to a mixture of dissimilar

classes within the graph at the same level

Page 16: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Is_a diamond in GO Process

behavior

locomotory behavior larval behavior

larval locomotory behavior

Page 17: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Is_a diamond in GO Functionenzyme regulator

activity

GTPase regulatoractivity

enzyme activatoractivity

GTPase activatoracivity

Page 18: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Is_a diamond in GO Cellular Component

organelle

non-membrane boundorganelle

intracellularorganelle

non-membrane boundintracellular organelle

Page 19: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Technically the diamonds are correct, but could be eliminated

locomotory behavior larval behavior

GTPase regulatoractivity

enzyme activatoractivity

non-membrane boundorganelle

intracellularorganelle

What do these pairs have in common?

Page 20: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

What do the middle pair of terms all have in common?

locomotory behavior larval behavior

GTPase regulatoractivity

enzyme activatoractivity

non-membrane boundorganelle

intracellularorganelle

Page 21: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

They are all differentiated from the parent term by a different factor

locomotory behavior larval behavior

GTPase regulatoractivity

enzyme activatoractivity

non-membrane boundorganelle

intracellularorganelle

Type of behavior vs. what is behaving

What is regulated vs. type of regulator

Type of organelle vs. location of organelle

Page 22: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Insert an intermediate grouping term

behavior

locomotory behavior larval behavior

larval locomotory behavior

behavior of a thing

descriptive behavior

Page 23: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Why insert terms that no one would use?

behavior

locomotory behavior larval behavior rhythmic behavior adult behavior

By the structure of this graph, locomotory behaviorhas the same relationship to larval behavior

as to rhythmic behavior

Page 24: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Why insert terms that no one would use?

behavior

locomotory behavior larval behaviorrhythmic behavior adult behavior

But actually, locomotory behavior/rhythmic behaviorand larval behavior/adult behavior group naturally

Descriptivebehavior

Behaviorof athing

Page 25: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Is_a diamond in GO Process

behavior

locomotory behavior larval behavior

larval locomotory behavior

The realtionships differentiate behavior in different ways

Page 26: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

GO DefinitionsA definition written by

a biologist:necessary & sufficient

conditions written definition(not computable)

Graph structure: necessary conditions

formal(computable)

Page 27: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Relationships and definitions

The set of necessary conditions is determined by the graph This can be considered a partial definition

Important considerations: Placement in the graph- selecting parents Appropriate relationships to different

parents True path violation

Page 28: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Placement in the graph

Example- Proteasome complex

Page 29: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The importance of relationships Cyclin dependent protein kinase

Complex has a catalytic and a regulatory subunit How do we represent these activities (function) in

the ontology? Do we need a new relationship type (regulates)?

Catalytic activity

protein kinase activity

protein Ser/Thr kinase activity

Cyclin dependent protein kinase activity

Cyclin dependent protein kinase regulator activity

Molecular_function

Enzyme regulator activity

Protein kinase regulator activity

Page 30: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

True path violationWhat is it?

..”the pathway from a child term all the way up to its top-level parent(s) must always be true".

chromosome

Mitochondrial chromosome

Is_a relationship

Part_of relationship

nucleus

Page 31: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

True path violationWhat is it?

..”the pathway from a child term all the way up to its top-level parent(s) must always be true".

nucleus chromosome

Nuclear chromosome

Mitochondrial chromosome

Is_a relationshipsPart_of relationship

Page 32: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

GO textual definitions: Related GO terms have similarly structured (normalized) definitions

Page 33: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Structured definitions contain both genus and differentiae

Essence = Genus + Differentiae

neuron cell differentiation =Genus: differentiation (processes whereby a relativelyunspecialized cell acquires the specialized features of..)Differentiae: acquires features of a neuron

Page 34: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Basis in Reality

GO is designed by a consortium As long as egos don’t get in the way, GO

represents universals rather than concepts Large-scale developments of the GO are a result

of compromise

Gene Annotators have a large say in GO content Annotators are experts in their fields Annotators constantly read the scientific literature

Page 35: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Ontology alignmentOne of the current goals of GO is to align:

cone cell fate commitment retinal_cone_cell

keratinocyte differentiation keratinocyte

adipocyte differentiation fat_cell

dendritic cell activation dendritic_cell

lymphocyte proliferation lymphocyte

T-cell homeostasis T_lymphocyte

garland cell differentiation garland_cell

heterocyst cell differentiation heterocyst

Cell Types in GO Cell Types in the Cell Ontologywith

Page 36: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Alignment of the Two Ontologies will permit the generation of consistent and complete definitions

id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629]is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375

GO

Cell type

New Definition

+

=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

Page 37: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Alignment of the Two Ontologies will permit the generation of consistent

and complete definitions

id: GO:0001649

name: osteoblast differentiationsynonym: osteoblast cell differentiationgenus: differentiation GO:0030154 (differentiation)differentium: acquires_features_of CL:0000062 (osteoblast)definition (text): Processes whereby a relatively unspecialized cell acquires the specialized features of an osteoblast, the mesodermal cell that gives rise to bone

Formal definitions with necessary and sufficient conditions, in both human readable and computer readable forms

Page 38: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Other Ontologies that can be aligned with GO

Chemical ontologies 3,4-dihydroxy-2-butanone-4-phosphate synthase activity

Anatomy ontologies metanephros development

GO itself mitochondrial inner membrane peptidase activity

Page 39: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

But Eventually…

Page 40: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

But, what about instances?

What are the instances we are dealing with in our work as ontology builders

and scientific curators?

Page 41: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

What knowledge are we trying to capture?

We are interested in understanding how genes contribute to the biology of

an organism.

Page 42: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

What do we mean by gene product?

Gene Product Type An abstract representation of a gene

These are the representations we have in MODs

Gene Product Instance A molecule of a gene product

It can be physically isolated It takes up space

Page 43: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

How do wet-bench biologists learn about gene products?

They do experiments!Experiments are designed to study the properties of

gene product instances.

Experimental biologists take on “The Burden of Proof”.

Page 44: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

How do we represent the accumulated knowledge

We make annotations!Annotations connect what wet-bench biologists see in the lab with how we

represent our understanding of biology

Page 45: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

So, where are the instances?

The instances are in the lab. We use what people report about instances, but we never actually deal with them

directly

Page 46: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Examples of how we connect instances with knowledge representation in the GO

What follows are examples of annotation of the biomedical literature using GO types, gene product types

and evidence codes

Page 47: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Example #1:Molecular Function using IDA

Figure from

Zhang M, Chen W, Smith SM, Napoli JL.Molecular characterization of a mouse short chain dehydrogenase/reductase active with all-trans-retinol in intact cells, mRDH1.J Biol Chem. 2001 Nov 23;276(47):44083-90.

Page 48: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Annotation:

The Observation

NAD+

NADHH+

Page 49: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

What are the instances in this experiment?

Gene product instances Molecules of retinol dehydrogenase

Molecular function instances Instances of execution of the molecular function

revealed by the assay Instances of molecular function associated with

instances of retinol dehydrogenase. These instances are the potential of a molecule of retinol dehydrogenase to execute the function retinol dehydrogenase activity.

Page 50: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Example #2:Molecular Function using IMP

Figure from

Schulz S, Lopez MJ, Kuhn M, Garbers DL.Disruption of the guanylyl cyclase-C gene leads to a paradoxical phenotype of viable but heat-stable enterotoxin-resistant mice.J Clin Invest. 1997 Sep 15;100(6):1590-5.

Page 51: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Annotation:

The Observation

X X

IMP

Page 52: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

What are the instances in this experiment?

Gene product instances Molecules of GUCY2C protein The lack of functional molecules of GUCY2C in

mutants Molecular function instances

The execution of the molecular function, measured by the accumulation of cGMP

The potential of a molecule of GUCY2C to execute the molecular function Revealed by the correlation between a lack of molecules

and a lack of executions of molecular function

Page 53: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Example #3:Molecular Function using IGI

Figure from

Sango K; McDonald MP; Crawley JN; Mack ML; Tifft CJ; Skop E; Starr CM; Hoffmann A; Sandhoff K; Suzuki K; Proia RLMice lacking both subunits of lysosomal beta-hexosaminidase display gangliosidosis and mucopolysaccharidosis.Nat Genet 1996 Nov;14(3):348

Page 54: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Annotation:

The Observation

X

Page 55: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Annotation:

The Observation

X

Page 56: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Annotation:

The Observation

X

Page 57: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Annotation:

The Observation

X

Page 58: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Annotation:

The Observation

IGI

X

Page 59: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

What are the instances in this experiment?

Gene product instances Molecules of HEXA protein Molecules of HEXB protein The lack of functional HEXA/HEXB protein in mutant cells

Molecular function instances The execution of the molecular function beta-N-

acetylhexosaminidase as measured by glycosaminoglycan accumulation

The potential of a molecule of HEXA/HEXB to execute the molecular function beta-N-acetylhexosaminidase

Page 60: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Example #4:Molecular Function using IPI

Figure from

Kuwako K; Hosokawa A; Nishimura I; Uetsuki T; Yamada M; Nada S; Okada M; Yoshikawa K

Disruption of the paternal necdin gene diminishes TrkA signaling for sensory neuron survival.

J Neurosci 2005 Jul 27;25(30):7090-9.

Page 61: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Annotation:

The Observation

IPI

NDN

NDN

Page 62: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Annotation:

The Observation

IPI

NDN

NDN

Page 63: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

What are the instances in this experiment?

Gene product instances FLAG-tagged molecules of NTRKA FLAG-tagged molecules of NGFR Molecules of NDN

Molecular function instances The execution of the molecular function protein binding between

instances of NDN and NTRKA-FLAG The execution of the molecular function protein binding between

instances of NDN and NGFR-FLAG The potential of a molecule of NDN to execute protein binding to a

molecule of NTRKA-FLAG or NGFR-FLAG The potential of a molecule of NTRKA-FLAG to execute protein

binding to a molecule of NDN The potential of a molecule of NGFR-FLAG to execute protein binding

to a molecule of NDN

Page 64: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

What About Biological Process?

It is very similar to function with a few exceptions

Page 65: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

Biological Process Using IMP

Washington Smoak I; Byrd NA; Abu-Issa R; Goddeeris MM; Anderson R; Morris J; Yamamura K; Klingensmith J; Meyers EN, Sonic hedgehog is required for cardiac outflow tract and neural crest cell development.,

Dev Biol 2005 Jul 15;283(2):357-72.

Page 66: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Annotation:

The Observation

IMP

Page 67: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

The Annotation:

The Observation

IMP

X

Page 68: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

What are the instances in this Experiment?

Gene product instances Molecules of the Shh gene Non-functional molecules of the Shh gene

Biological Process instances The development of a mouse heart

Molecular Function Instances The excecution of a molecular function by a

molecule of the Shh gene

So, when a process occursit is the result of moleculesof a gene product executing

their molecular function

Page 69: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

How do wet-bench biologists learn about gene products?

They do experiments!Experiments are designed to study the properties of

gene product instances.

Experimental biologists take on “The Burden of Proof”.

They make conclusionsabout gene product typesbased on the accumulated

experimental data!

Page 70: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

If experiments show:

All instances of a gene product studied have the potential to execute the function tyrosine kinase

Instances of the same gene product are involved in the biological process limb development

All instances of the same gene product are found in instances of the cytoplasm

Page 71: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

A wet-bench biologist would conclude

The gene product of this gene is a tyrosine kinase that functions in the cytoplasm and the tyrosine kinase function is used in limb

development

Page 72: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

If we comprehensively annotate genes, can we make

the same conclusions?

This is the basis of biological discovery!

Page 73: Principles for Building Biomedical Ontologies: A GO Perspective

April 19, 2023

A tribute to Lewis CarrollOnce master the machinery of Symbolic Logic, and you have a mental occupation always at hand, of absorbing interest, and one that will be of real use to you in any subject you may take up. It will give you clearness of thought - the ability to see your way through a puzzle - the habit of arranging your ideas in an orderly and get-at-able form - and, more valuable than all, the power to detect fallacies, and to tear to pieces the flimsy illogical arguments, which you will so continually encounter in books, in newspapers, in speeches, and even in sermons, and which so easily delude those who have never taken the trouble to master this fascinating Art. Lewis Carroll

(a)  All babies are illogical.(b)  Nobody is despised who can manage a crocodile.(c)  Illogical persons are despisedCan a baby can manage a crocodileNo!